date:20070510

Re: unsubscribe

2007-05-10 Thread Thorsten Scherler

On Thu, 2007-05-10 at 10:05 +0100, Kainth, Sachin wrote:
 unsubscribe

Hi Sachin,

you need to send to a different mailing address:
[EMAIL PROTECTED]

HTH

salu2
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions

Re: Question about delete

2007-05-10 Thread James liu



but index file size not changed and maxDoc not changed.




2007/5/10, Nick Jenkin [EMAIL PROTECTED]:


Hi James,
As I understand it numDocs is the number of documents in your index,
maxDoc is the most documents you have ever had in your index.

You currently have no documents in your index by the looks, thus your
delete query must of deleted everything. That would be why you are
getting no results.

-Nick

On 5/10/07, James liu [EMAIL PROTECTED] wrote:
 i use command like this

  curl http://localhost:8983/solr/update --data-binary
'deletequeryname:DDR/query/delete'
  curl http://localhost:8983/solr/update --data-binary 'commit/'
 
 
 and i get

  numDocs : 0
  maxDoc : 1218819
 

 when i search something which exists in before delete and find nothing.

 but index file size not changed and maxDoc not changed.

 why it happen?


 --
 regards
 jl



--
- Nick





--
regards
jl

Re: Does solr support index which made by lucene 1.4.3

2007-05-10 Thread Yonik Seeley


On 5/10/07, James liu [EMAIL PROTECTED] wrote:

i try, it show me error information:


Solr could support a Lucene 1.4.3 index if the schema was configured
to match it.
I see the following buried in your logs:

java.lang.RuntimeException: Can't find resource 'solrconfig.xml'

-Yonik

Costume response writer

2007-05-10 Thread Debra


I have written a costume response writer and added  the  response writer to
solrconfig.xml

When I run a program I can see the costume response writer is initialized,
but when I run a search with the costume writer's name as the wt paramater
the search is executed but the response writer is not called 
(Even the first line of the write function in the costume writer,which  is 
log.info(...)  is not written out.).

Any leads of what might be the cause?

Thank you ,
Debra
-- 
View this message in context: 
http://www.nabble.com/Costume-response-writer-tf3721357.html#a10412462
Sent from the Solr - User mailing list archive at Nabble.com.

fast update handlers

2007-05-10 Thread Will Johnson

I'm trying to setup a system to have very low index latency (1-2
seconds) and one of the javadocs intrigued me:

 

DirectUpdateHandler2 implements an UpdateHandler where documents are
added directly to the main Lucene index as opposed to adding to a
separate smaller index

 

The plain DirectUpdateHandler also had the same in its docs.  Does this
imply that there use to be another handler that could send docs to a
small/faster index and then merge them in with a larger one or that
someone could in the future?  I read through a good bit of the code and
didn't see how it could be handled from a searcher perspective but
perhaps I'm missing some key piece.

 

- will

Re: fast update handlers

2007-05-10 Thread Yonik Seeley


On 5/10/07, Will Johnson [EMAIL PROTECTED] wrote:

I'm trying to setup a system to have very low index latency (1-2
seconds) and one of the javadocs intrigued me:

DirectUpdateHandler2 implements an UpdateHandler where documents are
added directly to the main Lucene index as opposed to adding to a
separate smaller index


The plain DirectUpdateHandler also had the same in its docs.  Does this
imply that there use to be another handler that could send docs to a
small/faster index and then merge them in with a larger one or that
someone could in the future?


That was the original design, before I thought of the current method
in DUH2. DirectUpdateHandler was just meant to get things working to
establish the external interface (it's only for testing... very slow
at overwriting docs).

Adding documents to a separate index and then merging would have no
real indexing speed advantage (it's essentially what Lucene does
anyway when adding to a large index).  There would be some advantage
for index distribution, but it would complicate things greatly.

High latency is caused by segment merges... this would happen when you
periodically had to merge the smaller index into the larger anyway.
You could do some other tricks for more predictable index times... set
a large mergeFactor and then call optimize after you have added your
batch of documents.

Stay tuned though... there has been some work on a lucene patch to do
merging in a background thread.

-Yonik

RE: fast update handlers

2007-05-10 Thread Will Johnson

I guess I was more concerned with doing the frequent commits and how
that would affect the caches.  Say I have 2M docs in my main index but I
want to add docs every 2 seconds all while doing queries.  if I do
commits every 2 seconds I basically loose any caching advantage and my
faceting performance goes down the tube.  If however, I were to add
things to a smaller index and then roll it into the larger one every ~30
minutes then I only take the hit on computing the larger filters caches
on that interval.  Further, if my smaller index were based on a
RAMDirectory instead of a FSDirectory I assume computing the filter sets
for the smaller index should be fast enough even every 2 seconds.

- will




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, May 10, 2007 9:49 AM
To: solr-user@lucene.apache.org
Subject: Re: fast update handlers

On 5/10/07, Will Johnson [EMAIL PROTECTED] wrote:
 I'm trying to setup a system to have very low index latency (1-2
 seconds) and one of the javadocs intrigued me:

 DirectUpdateHandler2 implements an UpdateHandler where documents are
 added directly to the main Lucene index as opposed to adding to a
 separate smaller index


 The plain DirectUpdateHandler also had the same in its docs.  Does
this
 imply that there use to be another handler that could send docs to a
 small/faster index and then merge them in with a larger one or that
 someone could in the future?

That was the original design, before I thought of the current method
in DUH2. DirectUpdateHandler was just meant to get things working to
establish the external interface (it's only for testing... very slow
at overwriting docs).

Adding documents to a separate index and then merging would have no
real indexing speed advantage (it's essentially what Lucene does
anyway when adding to a large index).  There would be some advantage
for index distribution, but it would complicate things greatly.

High latency is caused by segment merges... this would happen when you
periodically had to merge the smaller index into the larger anyway.
You could do some other tricks for more predictable index times... set
a large mergeFactor and then call optimize after you have added your
batch of documents.

Stay tuned though... there has been some work on a lucene patch to do
merging in a background thread.

-Yonik

Re: fast update handlers

2007-05-10 Thread Yonik Seeley


On 5/10/07, Will Johnson [EMAIL PROTECTED] wrote:

I guess I was more concerned with doing the frequent commits and how
that would affect the caches.  Say I have 2M docs in my main index but I
want to add docs every 2 seconds all while doing queries.  if I do
commits every 2 seconds I basically loose any caching advantage and my
faceting performance goes down the tube.  If however, I were to add
things to a smaller index and then roll it into the larger one every ~30
minutes then I only take the hit on computing the larger filters caches
on that interval.  Further, if my smaller index were based on a
RAMDirectory instead of a FSDirectory I assume computing the filter sets
for the smaller index should be fast enough even every 2 seconds.


There isn't currently any support for incrementally updating filters.

-Yonik

Re: Question about delete

2007-05-10 Thread Ajanta Phatak

I believe in lucene at least deleting documents only marks them for 
deletion. The actual delete happens only after closing the IndexReader. 
Not sure about Solr


Ajanta.

James liu wrote:


but index file size not changed and maxDoc not changed.




2007/5/10, Nick Jenkin [EMAIL PROTECTED]:


Hi James,
As I understand it numDocs is the number of documents in your index,
maxDoc is the most documents you have ever had in your index.

You currently have no documents in your index by the looks, thus your
delete query must of deleted everything. That would be why you are
getting no results.

-Nick

On 5/10/07, James liu [EMAIL PROTECTED] wrote:
 i use command like this

  curl http://localhost:8983/solr/update --data-binary
'deletequeryname:DDR/query/delete'
  curl http://localhost:8983/solr/update --data-binary 'commit/'
 
 
 and i get

  numDocs : 0
  maxDoc : 1218819
 

 when i search something which exists in before delete and find 
nothing.


 but index file size not changed and maxDoc not changed.

 why it happen?


 --
 regards
 jl



--
- Nick

RE: fast update handlers

2007-05-10 Thread Charlie Jackson

What about issuing separate commits to the index on a regularly
scheduled basis? For example, you add documents to the index every 2
seconds, or however often, but these operations don't commit. Instead,
you have a cron'd script or something that just issues a commit every 5
or 10 minutes or whatever interval you'd like. 

I had to do something similar when I was running a re-index of my entire
dataset. My program wasn't issuing commits, so I just cron'd a commit
for every half hour so it didn't overload the server. 

Thanks,
Charlie


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, May 10, 2007 9:07 AM
To: solr-user@lucene.apache.org
Subject: Re: fast update handlers

On 5/10/07, Will Johnson [EMAIL PROTECTED] wrote:
 I guess I was more concerned with doing the frequent commits and how
 that would affect the caches.  Say I have 2M docs in my main index but
I
 want to add docs every 2 seconds all while doing queries.  if I do
 commits every 2 seconds I basically loose any caching advantage and my
 faceting performance goes down the tube.  If however, I were to add
 things to a smaller index and then roll it into the larger one every
~30
 minutes then I only take the hit on computing the larger filters
caches
 on that interval.  Further, if my smaller index were based on a
 RAMDirectory instead of a FSDirectory I assume computing the filter
sets
 for the smaller index should be fast enough even every 2 seconds.

There isn't currently any support for incrementally updating filters.

-Yonik

RE: fast update handlers

2007-05-10 Thread Will Johnson

The problem is I want the newly added documents to be made searchable
every 1-2 seconds so I need the commits.  I was hoping that the caches
could be stored/tied to the IndexSearcher then a MultiSearcher could
take advantage of the multiple sub indexes and their respective caches.


I think the best approach now will be to write a top level federator
that can merge the large ~static index and the smaller more dynamic
index.

- will



-Original Message-
From: Charlie Jackson [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 10, 2007 10:53 AM
To: solr-user@lucene.apache.org
Subject: RE: fast update handlers

What about issuing separate commits to the index on a regularly
scheduled basis? For example, you add documents to the index every 2
seconds, or however often, but these operations don't commit. Instead,
you have a cron'd script or something that just issues a commit every 5
or 10 minutes or whatever interval you'd like. 

I had to do something similar when I was running a re-index of my entire
dataset. My program wasn't issuing commits, so I just cron'd a commit
for every half hour so it didn't overload the server. 

Thanks,
Charlie


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, May 10, 2007 9:07 AM
To: solr-user@lucene.apache.org
Subject: Re: fast update handlers

On 5/10/07, Will Johnson [EMAIL PROTECTED] wrote:
 I guess I was more concerned with doing the frequent commits and how
 that would affect the caches.  Say I have 2M docs in my main index but
I
 want to add docs every 2 seconds all while doing queries.  if I do
 commits every 2 seconds I basically loose any caching advantage and my
 faceting performance goes down the tube.  If however, I were to add
 things to a smaller index and then roll it into the larger one every
~30
 minutes then I only take the hit on computing the larger filters
caches
 on that interval.  Further, if my smaller index were based on a
 RAMDirectory instead of a FSDirectory I assume computing the filter
sets
 for the smaller index should be fast enough even every 2 seconds.

There isn't currently any support for incrementally updating filters.

-Yonik

Re: Question about delete

2007-05-10 Thread Yonik Seeley


On 5/10/07, Ajanta Phatak [EMAIL PROTECTED] wrote:

I believe in lucene at least deleting documents only marks them for
deletion. The actual delete happens only after closing the IndexReader.
Not sure about Solr


Closing an IndexReader only flushes the list of deleted docids to the
index... it doesn't actually delete them.  Deletions only happen when
the deleted docs segment is involved in a merge, or when an optimize
is done (which is a merge of all segments).

-Yonik

Re: Costume response writer

2007-05-10 Thread Yonik Seeley


On 5/10/07, Debra [EMAIL PROTECTED] wrote:

I have written a costume response writer and added  the  response writer to
solrconfig.xml

When I run a program I can see the costume response writer is initialized,
but when I run a search with the costume writer's name as the wt paramater
the search is executed but the response writer is not called
(Even the first line of the write function in the costume writer,which  is
log.info(...)  is not written out.).

Any leads of what might be the cause?


That doesn't make sense... something like the dismax handler is the
same to solr as any other custom request handler.

Perhaps look for the dismax handler init in the log files and compare
it to your handler.

-Yonik

Re: Requests per second/minute monitor?

2007-05-10 Thread Walter Underwood

Yes, that is possible, but we also monitor Apache, Tomcat, the JVM, and
OS through JMX and other live monitoring interfaces. Why invent a real-time
HTTP log analysis system when I can fetch /search/stats.jsp at any time?

By number of rows fetched, do you mean number of documents matched?

The log you describe is pretty useful. Ultraseek has something similar
and that is the log most often used by admins. I'd recommend also
logging the start and rows part of the request so you can distinguish
between new queries and second page requests. If possible, make the
timestamp the same as the HTTP access log so you can correlate the
entries.

wunder

On 5/9/07 9:43 PM, Ian Holsman [EMAIL PROTECTED] wrote:
 
 Walter Underwood wrote:
 This is for monitoring -- what happened in the last 30 seconds.
 Log file analysis doesn't really do that.
 
 I would respectfully disagree.
 Log file analysis of each request can give you that, and a whole lot more.
 
 you could either grab the stats via a regular cron job, or create a separate
 filter to parse them real time.
 It would then let you grab more sophisticated stats if you choose to.
 
 What I would like to know is (and excuse the newbieness of the question) how
 to enable solr to log a file with the following data.
 
 - time spent (ms) in the request.
 - IP# of the incoming request
 - what the request was (and what handler executed it)
 - a status code to signal if the request failed for some reasons
 - number of rows fetched
 and 
 - the number of rows actually returned
 
 is this possible? (I'm using tomcat if that changes the answer).
 
 regards
 Ian

dates times

2007-05-10 Thread Brian Whitman

After writing my 3rd parser in my third scripting language in so many  
months to go from unix timestamps to Solr Time (8601) I have to  
ask: shouldn't the date/time field type be more resilient? I assume  
there's a good reason that it's 8601 internally, but certainly it  
would be excellent for Solr to transcode different types into Solr Time.


My main problem (as a normal Solr end user) is that it's hard to do  
math directly on 8601 dates or really parse them without specific  
packages. My XSL 2.0 parsers don't even like it without some  
massaging (forget about XSL 1.0.) UNIX time (seconds since the epoch)  
is super easy, as are sortable delimitable strings like  
20070510125403.


I'm not advocating replacing 8601 as the known good Solr Time, just  
that some leeway be given in the parser to accept unix time or  
something else and the conversion to 8601 happens internally. And a  
further dream is to have a strftime formatter in solrconfig for the  
query response, so I can always have my date fields come back as May  
10th, 2007, 12:58pm.


-Brian

Re: fast update handlers

2007-05-10 Thread Ryan McKinley



I don't know if this helps, but...

Do *all* your queries need to include the fast updates?  I have a setup 
where there are some cases that need the newest stuff but most cases can 
wait 5 mins (or so)


In that case, I have two solr instances pointing to the same index 
files.  One is used for updates and queries that need everything.  The 
other is a read-only index that serves the majority of queries.


What is nice about this is that you can set different cache sizes and 
auto-warming for the different cases.


ryan


Will Johnson wrote:

The problem is I want the newly added documents to be made searchable
every 1-2 seconds so I need the commits.  I was hoping that the caches
could be stored/tied to the IndexSearcher then a MultiSearcher could
take advantage of the multiple sub indexes and their respective caches.


I think the best approach now will be to write a top level federator
that can merge the large ~static index and the smaller more dynamic
index.

- will



-Original Message-
From: Charlie Jackson [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 10, 2007 10:53 AM

To: solr-user@lucene.apache.org
Subject: RE: fast update handlers

What about issuing separate commits to the index on a regularly
scheduled basis? For example, you add documents to the index every 2
seconds, or however often, but these operations don't commit. Instead,
you have a cron'd script or something that just issues a commit every 5
or 10 minutes or whatever interval you'd like. 


I had to do something similar when I was running a re-index of my entire
dataset. My program wasn't issuing commits, so I just cron'd a commit
for every half hour so it didn't overload the server. 


Thanks,
Charlie


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, May 10, 2007 9:07 AM
To: solr-user@lucene.apache.org
Subject: Re: fast update handlers

On 5/10/07, Will Johnson [EMAIL PROTECTED] wrote:

I guess I was more concerned with doing the frequent commits and how
that would affect the caches.  Say I have 2M docs in my main index but

I

want to add docs every 2 seconds all while doing queries.  if I do
commits every 2 seconds I basically loose any caching advantage and my
faceting performance goes down the tube.  If however, I were to add
things to a smaller index and then roll it into the larger one every

~30

minutes then I only take the hit on computing the larger filters

caches

on that interval.  Further, if my smaller index were based on a
RAMDirectory instead of a FSDirectory I assume computing the filter

sets

for the smaller index should be fast enough even every 2 seconds.


There isn't currently any support for incrementally updating filters.

-Yonik

Re: dates times

2007-05-10 Thread Mike Klaas


On 5/10/07, Brian Whitman [EMAIL PROTECTED] wrote:

After writing my 3rd parser in my third scripting language in so many
months to go from unix timestamps to Solr Time (8601) I have to
ask: shouldn't the date/time field type be more resilient? I assume
there's a good reason that it's 8601 internally, but certainly it
would be excellent for Solr to transcode different types into Solr Time.

My main problem (as a normal Solr end user) is that it's hard to do
math directly on 8601 dates or really parse them without specific
packages. My XSL 2.0 parsers don't even like it without some
massaging (forget about XSL 1.0.) UNIX time (seconds since the epoch)
is super easy, as are sortable delimitable strings like
20070510125403.


I'm not sure what delimitable means, but Solr datetimes _are_
essentially sortable inverse-magnitude like the above, with a few
punctuation symbols thrown in.  I have no XSLT-fu, but is it not
possible to do regexp-replace s/[TZ:-]//g on the solrdate to get the
above?


I'm not advocating replacing 8601 as the known good Solr Time, just
that some leeway be given in the parser to accept unix time or
something else and the conversion to 8601 happens internally. And a
further dream is to have a strftime formatter in solrconfig for the
query response, so I can always have my date fields come back as May
10th, 2007, 12:58pm.


Those are interesting ideas and it probably would not be difficult to
create a patch if you were interested, but I'm curious:  What about
XSL makes what seems to me an elementary string-processing task so
difficult?

regards
-Mike

RE: dates times

2007-05-10 Thread Binkley, Peter

You can get at some of this functionality in the built-in xslt 1.0
engine (Xalan) by using the e-xslt date-time extensions: see
http://exslt.org/date/index.html, and for Xalan's implementation see
http://xml.apache.org/xalan-j/extensionslib.html#exslt . There are some
examples here:
http://www-128.ibm.com/developerworks/library/x-exslt.html . I haven't
tried this in Solr but I don't think there's any reason why it wouldn't
work; I've used it in other Xalan-J environments, notably Cocoon. 

Peter

-Original Message-
From: Brian Whitman [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 10, 2007 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: dates  times


 Those are interesting ideas and it probably would not be difficult to 
 create a patch if you were interested, but I'm curious:  What about 
 XSL makes what seems to me an elementary string-processing task so 
 difficult?


Well, XSL 1.0 (which is the one that comes for free with Solr/java)
doesn't handle dates and times at all. XSL 2.0 handles it well enough,
but it's only supported through a GPL jar, which we can't distribute.

It's more than string processing, anyway. I would want to convert the
Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app.  
I'd also like to say 'Posted 3 days ago. In my vision of things, that
work is done on Solr's side. (The former case with a strftime type
formatter in solrconfig, the latter by having strftime return the day
number this year.)

Re: dates times

2007-05-10 Thread Brian Whitman


You can get at some of this functionality in the built-in xslt 1.0
engine (Xalan) by using the e-xslt date-time extensions: see
http://exslt.org/date/index.html, and for Xalan's implementation see
http://xml.apache.org/xalan-j/extensionslib.html#exslt .


The exslt stuff looks good, thanks! I'll have to try it out. That's  
only one direction though, still want parsing of unix timestamp-like  
formats into the indexer on doc adds and updates.


Just FYi the license for the exslt stuff seems OK w/ the APL: http:// 
lists.fourthought.com/pipermail/exslt-manage/2004-June/000603.html
So if it works out we might want to put the date/time xsl stuff in  
the solr distribution  in lieu of shipping with a XSL 2.0 processor.








Those are interesting ideas and it probably would not be difficult to
create a patch if you were interested, but I'm curious:  What about
XSL makes what seems to me an elementary string-processing task so
difficult?



Well, XSL 1.0 (which is the one that comes for free with Solr/java)
doesn't handle dates and times at all. XSL 2.0 handles it well enough,
but it's only supported through a GPL jar, which we can't distribute.

It's more than string processing, anyway. I would want to convert the
Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app.
I'd also like to say 'Posted 3 days ago. In my vision of things, that
work is done on Solr's side. (The former case with a strftime type
formatter in solrconfig, the latter by having strftime return the day
number this year.)







--
http://variogr.am/
[EMAIL PROTECTED]

Re: Costume response writer

2007-05-10 Thread Debra


This is from the log:
...
INFO: adding queryResponseWriter
jdbc=com.lss.search.request.JDBCResponseWriter
10/05/2007 21:11:39 com.lss.search.request.JDBCResponseWriter init
INFO: Init JDBC reponse writer  //This is added from the ini of the
class to see that it's actually finding the right one
...
10/05/2007 21:11:44 org.apache.solr.core.SolrCore execute
INFO: null jdsn=4start=0q=whitewt=jdbcqt=standardrows=90 0 1442
10/05/2007 21:11:44 org.apache.solr.core.SolrCore close


This is from the JDBCResponseWriter code:

public void write(Writer writer, SolrQueryRequest request, SolrQueryResponse
response) throws IException {
log.info(USING JDBC RESPONSE WRITER);


The line USING JDBC RESPONSE WRITER doesn't appear in the log.

Thanks,
Debra
-- 
View this message in context: 
http://www.nabble.com/Costume-response-writer-tf3721357.html#a10418873
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dates times

2007-05-10 Thread Chris Hostetter


: It's more than string processing, anyway. I would want to convert the
: Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app.
: I'd also like to say 'Posted 3 days ago. In my vision of things,
: that work is done on Solr's side. (The former case with a strftime
: type formatter in solrconfig, the latter by having strftime return
: the day number this year.)

One of the early architecture/design principles of the Solr search
APIs was compute secondary info about a result if it's more
efficient or easier to compute in Solr then it would be for a client to do
it -- DocSet caches, facet counts, and sorting/pagination being
great examples of things where Solr can do less work to get the same
info out of raw data then a client app would because of it's low level
access to the data, and becuase of how much data would need to go over the
wire for the client to do the same computation. ... that's largely just a
lit bit of historic trivial however, Solr has a lot of features now which
might not hold up to the yard stick, but i mention it only to clarify one
of hte reasons Solr didnt' have more 'configurable date formatting to
start with.

it has been on the TaskList since the start of incubation however...

  * a DateTime field (or Query Parser extension) that allows flexible
input for easier human entered queries
  * allow alternate format for date output to ease client creation of
date objects?

One of hte reasons i dont' think anyone has tackled them yet is because
it's hard to get a holistic view of a solution, because there are really
several loosely related problems with date formatting issues:

The first is a discusion of the internal format and what resolution the
dates are stored at in the index itself.  if you *know* that you never
plan on querying with anything more fine grained then day resolution,
storing your dates with only day resolution can make your index a lot
smaller (and make date searches a lot faster).  with the current DateField
the same performance benefits can be achieved by rounding your dates
before indexing them, but if we were to make it a config option on
DateField itself to automaticly round, we would need to take this info
into account when parsing updates -- should the client be exepcted to know
what precision each date field uses?  do they send dates expressed using
the internal format, or as fully qualified times?  is it an
error/warning to attempt to index more datetime precision then a field
supports?

The second is a discussion of external format (which seems to be what
you are mostly discussing)  the most trivial way to address this would be
options on the ResponseWriters that allow them to be configured with
DateFormater Strings they would use to process any date they return .. but
that raises questions about the QueryParsing aspect as well ... should
date formating be a property of the response, or a property of the
request, such that both input and output formats are identicle?

Third is how the discussions of the internal format and the external
format shouldn't be treated completely indepndent.  it's tempting to say
that there will be a clean abstraction between the two, that all client
interaction will be done using configured external formater(s) to create
internal java Date objects, which will then be translated back to Strings
by an internal formater for the purpose of indexing (and querying) but
what happens when a query expresses a date range too precise for the
granularity expressed by the internal format? do we match
nothing/everything? ... what if the indexed granularity is *more* recised
then the uery graunlarity .. how do we know if a range query between March
6, 2007 and May 10, 2007 on a field that stores millisencond granularity
is suppose to go from the first millisecond of each day or the last?



Questions like these are whiy I'm glad Solr currently keeps it simple and
makes people deal in absolutes .. less room for confusion  :)


-Hoss

Re: Costume response writer

2007-05-10 Thread Chris Hostetter


: INFO: adding queryResponseWriter
: jdbc=com.lss.search.request.JDBCResponseWriter

: 10/05/2007 21:11:44 org.apache.solr.core.SolrCore execute
: INFO: null jdsn=4start=0q=whitewt=jdbcqt=standardrows=90 0 1442

that's very strange ... the only thing that jumps out at me is the null
there where the context path is suppose to be logged, it suggests that you
aren't useing the standard /select URL so maybe this is a bug with some of
hte new request handler path based stuff?

can you clarify:

1) which version of Solr you are using (the Solr Implementation Version
from /admin/registry.jsp gives the best answer)

2) exactly what URL you are hitting to generate this request

3) what the solrconfig.xml lools like for your queryResponseWriter and
requestHandler configurations

4) Lastly: what response does your client get?  is it the default XML
response, or just nothing at all?


-Hoss

Re: dates times

2007-05-10 Thread Brian Whitman


On May 10, 2007, at 2:30 PM, Chris Hostetter wrote:
Questions like these are whiy I'm glad Solr currently keeps it  
simple and

makes people deal in absolutes .. less room for confusion  :)


I get all that, thanks for the great explanation.

I imagine most of my problems can be solved with a custom field  
analyzer (converting other date strings to 8601 during indexing) and  
then XSL on the select?q= end (which we do anyway.) In other words,  
leaving core solr absolute with an option for different date  
analyzers. I see the need to not clutter it up, especially at this  
stage.


What would, say, a filter that converted unix timestamps to 8601  
before indexing as a solr.DateField look like? Is that a custom  
filter, or a tokenizer?

Re: dates times

2007-05-10 Thread Yonik Seeley


On 5/10/07, Brian Whitman [EMAIL PROTECTED] wrote:

On May 10, 2007, at 2:30 PM, Chris Hostetter wrote:
 Questions like these are whiy I'm glad Solr currently keeps it
 simple and
 makes people deal in absolutes .. less room for confusion  :)

I get all that, thanks for the great explanation.

I imagine most of my problems can be solved with a custom field
analyzer (converting other date strings to 8601 during indexing) and
then XSL on the select?q= end (which we do anyway.) In other words,
leaving core solr absolute with an option for different date
analyzers. I see the need to not clutter it up, especially at this
stage.

What would, say, a filter that converted unix timestamps to 8601
before indexing as a solr.DateField look like? Is that a custom
filter, or a tokenizer?


That would be a custom filter which is currently only supported by
text fields, so the XML output would be str instead of date (if
that matters to you).

One could also just store seconds or milliseconds in an int or long
field.  That's fine for small devel teams, but not ideal since it's a
bit less expressive.

The right approach for more flexible date parsing is probably to add
more functionality to the date field and configure via optional
attributes.

-Yonik

RE: cwd requirement to run Solr with Tomcat

2007-05-10 Thread Teruhiko Kurosaka

BTW,
The Simple Example Install section in 
http://wiki.apache.org/solr/SolrTomcat
leaves the unzipped directory apache-solr-nightly-incubating
intact, but this is not needed after copying the
solr.war and the example solr directory, is it?
Can I edit the instruction to insert:
rm -r apache-solr-nightly-incubating
after the cp line?

-kuro

Does Solr XSL writer work with Arabic text?

2007-05-10 Thread Teruhiko Kurosaka

I'm trying to search an index of docs which have text fields in Arabic,
using XSL writer (wt=xslttr=example.xsl).  But the Arabic text gets
all garbled.  Is XSL writer known to work for Arabic text? Is anybody
using it?
 
-kuro

Re: Does Solr XSL writer work with Arabic text?

2007-05-10 Thread Brian Whitman


In example.xsl change the output type

  xsl:output media-type=text/html/

to

  xsl:output media-type=text/html; charset=UTF-8 encoding=UTF-8/


And see if that helps. I had the same problem (different language.)  
If this works we should file a JIRA to fix it up in trunk.





On May 10, 2007, at 4:13 PM, Teruhiko Kurosaka wrote:

I'm trying to search an index of docs which have text fields in  
Arabic,

using XSL writer (wt=xslttr=example.xsl).  But the Arabic text gets
all garbled.  Is XSL writer known to work for Arabic text? Is anybody
using it?

-kuro


--
http://variogr.am/
[EMAIL PROTECTED]

Re: dates times

2007-05-10 Thread Chris Hostetter


: The right approach for more flexible date parsing is probably to add
: more functionality to the date field and configure via optional
: attributes.

Adding configuration options to DateField seems like it might ultimately
be the right choice for changing the *internal* format, but assuming we
want to keep the internal representation of DateField fixed and
unconfigurable for the time being and address the the various *external*
formatting issues i imagine the simplest things to tackle this (in a way
that is consistent with the other datatypes) would be...

1) change DateField to support Analyzers.  that way you could have
seperate analyzers for indexing vs querying just like a text field (so you
could for example send Solr seconds since epoch when indexing, and
query query using MM/DD/)

The Analyzers used would be responsible for producing Tokens which match
what values the current DateField.toInternal() already consideres legal
(either a DateMath string or an iso8601 string).

(In general a DateTranslatingTokenFilter class would be a pretty cool
addition to Lucene, it could as constructor args two DateFormatters (one
for parsing the incoming tokens, and one for formating the outgoing
tokens) and a boolean indicating wether it's job was to replace matching
tokens or inject duplicate tokens in the same position ... maybe another
option indicating wether incoming Tokens that can't be parsed should be
striped or passed through ... the idea being that for something like
DateFiled you would use KeywordTokenizer along with an instance of this to
parse whatever format you wanted -- but when parsing generic text you
might have several of these TokenFilters configured with differnet
DateFormatters so if they see a Token in the text that matches a known
DateFormat they could inject the name of the month, or the day of hte week
into the text at the same position.)


2) add options to the various QueryResponseWriters to control which format
they use when writting fields out. .. in the case of XmlResposneWriter it
would still produce a date tag, but the value would be formated
according to the configuration.


-Hoss

Re: dates times

2007-05-10 Thread Ryan McKinley



(In general a DateTranslatingTokenFilter class would be a pretty cool
addition to Lucene, it could as constructor args two DateFormatters (one
for parsing the incoming tokens, and one for formating the outgoing


If this happens, it would be nice (perhaps overkill) to have a chronic 
input filter:


http://chronic.rubyforge.org/

the java port:
https://jchronic.dev.java.net/

---

brian, for a quick easy solution, if you find working with unix 
timestamps easier, perhaps just want to to put in dates as a 
SortableLongField and deal with the formatting that way.

Re: Costume response writer

2007-05-10 Thread Debra




hossman_lucene wrote:
 
 
 can you clarify:
 
 1) which version of Solr you are using (the Solr Implementation Version
 from /admin/registry.jsp gives the best answer)
 
 ...
 
 
 -Hoss
 
 
 

Just downloaded the latest night build and viola it's back on track (with
the other bugs...)
-- 
View this message in context: 
http://www.nabble.com/Costume-response-writer-tf3721357.html#a10421865
Sent from the Solr - User mailing list archive at Nabble.com.

RE: dates times

2007-05-10 Thread Binkley, Peter

Regarding Hoss's points about the internal format, resolution of
date-times, etc.: maybe a good starting point would be to implement the
date-time algorithms of XML Schema
(http://www.w3.org/TR/xmlschema-2/#isoformats), where these behaviors
are spelled out in reasonably precise terms. There must be code
somewhere that Solr could steal to help with this. This would mesh well
with XSLT 2.0, and presumably other modern XML environments.

peter

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 10, 2007 12:30 PM
To: solr-user@lucene.apache.org
Subject: Re: dates  times


: It's more than string processing, anyway. I would want to convert the
: Solr Time 2007-03-15T00:41:5:2Z to March 15th, 2007 in a web app.
: I'd also like to say 'Posted 3 days ago. In my vision of things,
: that work is done on Solr's side. (The former case with a strftime
: type formatter in solrconfig, the latter by having strftime return
: the day number this year.)

One of the early architecture/design principles of the Solr search
APIs was compute secondary info about a result if it's more efficient
or easier to compute in Solr then it would be for a client to do it --
DocSet caches, facet counts, and sorting/pagination being great examples
of things where Solr can do less work to get the same info out of raw
data then a client app would because of it's low level access to the
data, and becuase of how much data would need to go over the wire for
the client to do the same computation. ... that's largely just a lit bit
of historic trivial however, Solr has a lot of features now which might
not hold up to the yard stick, but i mention it only to clarify one of
hte reasons Solr didnt' have more 'configurable date formatting to
start with.

it has been on the TaskList since the start of incubation however...

  * a DateTime field (or Query Parser extension) that allows flexible
input for easier human entered queries
  * allow alternate format for date output to ease client creation of
date objects?

One of hte reasons i dont' think anyone has tackled them yet is because
it's hard to get a holistic view of a solution, because there are really
several loosely related problems with date formatting issues:

The first is a discusion of the internal format and what resolution
the dates are stored at in the index itself.  if you *know* that you
never plan on querying with anything more fine grained then day
resolution, storing your dates with only day resolution can make your
index a lot smaller (and make date searches a lot faster).  with the
current DateField the same performance benefits can be achieved by
rounding your dates before indexing them, but if we were to make it a
config option on DateField itself to automaticly round, we would need to
take this info into account when parsing updates -- should the client be
exepcted to know what precision each date field uses?  do they send
dates expressed using the internal format, or as fully qualified
times?  is it an error/warning to attempt to index more datetime
precision then a field supports?

The second is a discussion of external format (which seems to be what
you are mostly discussing)  the most trivial way to address this would
be options on the ResponseWriters that allow them to be configured with
DateFormater Strings they would use to process any date they return ..
but that raises questions about the QueryParsing aspect as well ...
should date formating be a property of the response, or a property of
the request, such that both input and output formats are identicle?

Third is how the discussions of the internal format and the external
format shouldn't be treated completely indepndent.  it's tempting to say
that there will be a clean abstraction between the two, that all client
interaction will be done using configured external formater(s) to
create internal java Date objects, which will then be translated back to
Strings by an internal formater for the purpose of indexing (and
querying) but what happens when a query expresses a date range too
precise for the granularity expressed by the internal format? do we
match nothing/everything? ... what if the indexed granularity is *more*
recised then the uery graunlarity .. how do we know if a range query
between March 6, 2007 and May 10, 2007 on a field that stores
millisencond granularity is suppose to go from the first millisecond of
each day or the last?



Questions like these are whiy I'm glad Solr currently keeps it simple
and makes people deal in absolutes .. less room for confusion  :)


-Hoss

RE: Facet only support english?

2007-05-10 Thread Teruhiko Kurosaka

If my memory is correct,  UTF-8 has been the default encoding per
XML specification from a very early stage. If the XML parser is not
defaulting 
to UTF-8 in absence of the encoding attribute, that means the XML
parser has a bug, and the code should be corrected.

(I don't have an objection to add the encoding attribute for clarity,
however.)
-kuro

 -Original Message-
 From: Walter Underwood [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, May 09, 2007 4:33 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Facet only support english?
 
 I didn't remember that requirement, so I looked it up. It was added
 in XML 1.0 2nd edition. Originally, unspecified encodings were open
 for auto-detection.
 
 Content type trumps encoding declarations, of course, per RFC 3023
 and allowed by the XML spec.
 
 wunder
 
 On 5/9/07 4:19 PM, Mike Klaas [EMAIL PROTECTED] wrote:
 
  I thought that conformant parsers use UTF-8 as the default anyway:
  
  http://www.w3.org/TR/REC-xml/#charencoding
  
  -Mike

Re: Index Concurrency

2007-05-10 Thread Otis Gospodnetic

Though, isn't there a recent patch to allow multiple indices under a single 
Solr instance in JIRA?

Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, May 9, 2007 6:32:33 PM
Subject: Re: Index Concurrency

On 5/9/07, joestelmach [EMAIL PROTECTED] wrote:
 My first intuition is to give each user their own index. My thinking here is
 that querying would be faster (since each user's index would be much smaller
 than one big index,) and, more importantly, that I would dodge any
 concurrency issues stemming from multiple threads trying to update the same
 index simultaneously.  I realize that Lucene implements a locking mechanism
 to protect against concurrent access, but I seem to hit the lock access
 timeout quite easily with only a couple threads.

 After looking at solr, I would really like to take advantage of the many
 features it adds to Lucene, but it doesn't look like I'll be able to achieve
 multiple indexes.

No, not currently.  Start your implementation with just a single
index... unless it is very large, it will likely be fast enough.

Solr also handles all the concurrency issues, and you should never hit
lock access timeout when updating from multiple threads.

-Yonik

Re: Index Concurrency

2007-05-10 Thread joestelmach



 Yes, coordination between the main index searcher, the index writer,
 and the index reader needed to delete other documents.

Can you point me to any documentation/code that describes this
implementation?

 That's weird... I've never seen that.
 The lucene write lock is only obtained when the IndexWriter is created.
 Can you post the relevant part of the log file where the exception
 happens?

After doing some more testing, I believe it was a stale lock file that was
causing me to have these lock issues yesterday - sorry for the false alarm
:)

 Also, unless you have at least 6 CPU cores or so, you are unlikely to
 see greater throughput with 10 threads.  If you add multiple documents
 per HTTP-POST (such that HTTP latency is minimized), the best setting
 would probably be nThreads == nCores.  For a single doc per POST, more
 threads will serve to cover the latency and keep Solr busy.

I agree with your thinking here.  My requirement for a large number of
threads is somewhat of an artifact of my current system design.  I'm trying
not to serialize the system's processing at the point of indexing.
-- 
View this message in context: 
http://www.nabble.com/Index-Concurrency-tf3718634.html#a10424207
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about delete

2007-05-10 Thread James liu


get it. thks yonik.

2007/5/10, Yonik Seeley [EMAIL PROTECTED]:


On 5/10/07, Ajanta Phatak [EMAIL PROTECTED] wrote:
 I believe in lucene at least deleting documents only marks them for
 deletion. The actual delete happens only after closing the IndexReader.
 Not sure about Solr

Closing an IndexReader only flushes the list of deleted docids to the
index... it doesn't actually delete them.  Deletions only happen when
the deleted docs segment is involved in a merge, or when an optimize
is done (which is a merge of all segments).

-Yonik





--
regards
jl

Solr concurrent commit not updated

2007-05-10 Thread David Xiao

Hello all,

 

I have tested by use post.sh in example directory to add xml documents into 
solr. It works when I add one by one.

But when I have a lot of .xml file to be posted (say about 500-1000 files) and 
I wrote a shell script to call post.sh one by one. I found those xml files are 
not searchable after post.

 

But from solr admin page / statistics I found that it records commited numbers. 
But numDocs is not updated.

So why, when I use post.sh to post one xml it will be fine, but if I use 
post.sh for 500 times, each time one xml will be different behavior?

 

Regards,

David

Re: Solr concurrent commit not updated

2007-05-10 Thread James liu


u should know id is unique number.

2007/5/11, David Xiao [EMAIL PROTECTED]:


Hello all,



I have tested by use post.sh in example directory to add xml documents
into solr. It works when I add one by one.

But when I have a lot of .xml file to be posted (say about 500-1000 files)
and I wrote a shell script to call post.sh one by one. I found those xml
files are not searchable after post.



But from solr admin page / statistics I found that it records commited
numbers. But numDocs is not updated.

So why, when I use post.sh to post one xml it will be fine, but if I use
post.sh for 500 times, each time one xml will be different behavior?



Regards,

David





--
regards
jl

RE: cwd requirement to run Solr with Tomcat

2007-05-10 Thread Chris Hostetter


that section was never really intented to be *the* set of instructions for
installing Solr on Tomcat, just the *simplest* set of things you could do
to see it working, many additional things could be done (besides deleting
the unzipped dir).  If we start listing more things, people may get
confused as assume those things *have* to be done.

I've added some better comments to try and clarify that it's a minimal
set of steps.

: The Simple Example Install section in
: http://wiki.apache.org/solr/SolrTomcat
: leaves the unzipped directory apache-solr-nightly-incubating
: intact, but this is not needed after copying the
: solr.war and the example solr directory, is it?
: Can I edit the instruction to insert:
: rm -r apache-solr-nightly-incubating
: after the cp line?




-Hoss

Re: Question about delete

2007-05-10 Thread Chris Hostetter


: Closing an IndexReader only flushes the list of deleted docids to the
: index... it doesn't actually delete them.  Deletions only happen when
: the deleted docs segment is involved in a merge, or when an optimize
: is done (which is a merge of all segments).

just to clarify slightly because deleted can be differnet things to
differnet people...

think of executing a delete command as logically deleting them, by
adding them to a list of documents to be ignored by IndexSearchers.  a
commit will ensure that deleted docs list is written to disk, and reopen
the IndexSearcher which will treat any documents in that list as if they
didn't exit.  When segment merges happens sometime in the future document
information is physically deleted in the sense that the data associated
with docs i nthe deleted list is actually removed from the index files,
and disk/ram space is freed up.




-Hoss

Re: Solr Sorting, merging/weighting sort fields

2007-05-10 Thread Walter Underwood

The boost is a way to adjust the weight of that field, just like you
adjust the weight of any other field. If the boost is dominating the
score, reduce the weight and vice versa.

wunder

On 5/10/07 9:22 PM, Chris Hostetter [EMAIL PROTECTED] wrote:

 
 : Is this correct?  bf is a boosting function, so a function is needed there,
 no?
 
 : If I'm not missing someting, the ^0.5 is just a boost, and popularity
 : is just a (numeric) field.  So boosting a numeric field wouldn't make
 : sense, but appying it to a function would. Am I missing something?
 
 the function parser does the right thing when you give it a bare field
 name, from the javadocs...
 
 http://lucene.apache.org/solr/api/org/apache/solr/search/QueryParsing.html#par
 seFunction(java.lang.String,%20org.apache.solr.schema.IndexSchema)
 // Numeric fields default to correct type
 // (ie: IntFieldSource or FloatFieldSource)
 // Others use implicit ord(...) to generate numeric field value
 myfield
 
 you are correct about 0.5 being the boost, using either the _val_ hack on
 the SolrQueryParser or using he bf param of dismax the ^0.5 will be used
 as a boost on the resulting function query...
 
qt=standardq=%2Bfoo%20_val_:popularity^0.5
qt=dismaxq=foobf=popularity^0.5
 
 
 -Hoss

RE: fast update handlers

2007-05-10 Thread Chris Hostetter

: want to add docs every 2 seconds all while doing queries.  if I do
: commits every 2 seconds I basically loose any caching advantage and my
: faceting performance goes down the tube.  If however, I were to add
: things to a smaller index and then roll it into the larger one every ~30
: minutes then I only take the hit on computing the larger filters caches

searching across both of these indexes (the big and the little) would
require something like a MultiReader, a way to unify DocSets
between the two, and the ability to cache on the sub indexes and on the
main MultiReader.

fortunately, a MultiReader is exactly what Lucence uses under the covers
when dealing with an FSDIrectory, so we're half way there.  something like
these might get us the rest of the way...

https://issues.apache.org/jira/browse/LUCENE-831
https://issues.apache.org/jira/browse/LUCENE-743




-Hoss

41 matches

Mail list logo