[jira] Commented: (SOLR-258) Date based Facets

2007-07-13 Thread Pieter Berkel (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512372
 ] 

Pieter Berkel commented on SOLR-258:


I've just tried this patch and the results are impressive!

I agree with Ryan regarding the naming of 'pre', 'post' and 'inner', using 
simple concrete words will make it easier for developers to understand the 
basic concepts.  At first I was a little confused how the 'gap' parameter was 
used, perhaps a name like 'interval' would be more indicative of it's purpose?

While on the topic of gaps / intervals, I can imagine a case where one might 
want facet counts over non-linear intervals, for instance obtaining results 
from: Last 7 days, Last 30 days, Last 90 days, Last 6 months.  
Obviously you can achieve this by setting facet.date.gap=+1DAY and then 
post-process the results, but a much more elegant solution would be to allow 
facet.date.gap  (or another suitably named param) to accept a 
(comma-delimited) set of explicit partition dates:

facet.date.start=NOW-6MONTHS/DAY
facet.date.end=NOW/DAY
facet.date.gap=NOW-90DAYS/DAY,NOW-30DAYS/DAY,NOW-7DAYS/DAY

It would then be trivial to calculate facet counts for the ranges specified 
above.

It would be useful to make the 'start' an 'end' parameters optional.  If not 
specified 'start' should default to the earliest stored date value, and 'end' 
should default to the latest stored date value (assuming that's possible).  
Probably should return a 400 if 'gap' is not set.

My personal opinion is that 'end' should be a hard limit, the last gap should 
never go past 'end'.  Given that the facet label is always generated from the 
lower value in the range, I don't think truncating the last 'gap' will cause 
problems, however it may be helpful to return the actual date value for end 
if it was specified as a offset of NOW.

What might be a problem is when both start and end dates are specified as 
offsets of NOW, the value of NOW may not be constant for both values.  In one 
of my tests, I set:

facet.date.start=NOW-12MONTHS
facet.date.end=NOW
facet.date.gap=+1MONTH

With some extra debugging output I can see that mostly the value of NOW is the 
same:

str name=start2006-07-13T06:06:07.397/str
str name=end2007-07-13T06:06:07.397/str

However occasionally there is a difference:

str name=start2006-07-13T05:48:23.014/str
str name=end2007-07-13T05:48:23.015/str

This difference alters the number of gaps calculated (+1 when NOW values are 
diff for start  end).  Not sure how this could be fixed, but as you mentioned 
above, it will probably involve changing ft.toExternal(ft.toInternal(...)).

Thanks again for creating this useful addition, I'll try to test it a bit more 
and see if I can find anything else.


 Date based Facets
 -

 Key: SOLR-258
 URL: https://issues.apache.org/jira/browse/SOLR-258
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: date_facets.patch, date_facets.patch, date_facets.patch, 
 date_facets.patch, date_facets.patch


 1) Allow clients to express concepts like...
 * give me facet counts per day for every day this month.
 * give me facet counts per hour for every hour of today.
 * give me facet counts per hour for every hour of a specific day.
 * give me facet counts per hour for every hour of a specific day and 
 give me facet counts for the 
number of matches before that day, or after that day. 
 2) Return all data in a way that makes it easy to use to build filter queries 
 on those date ranges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (SOLR-298) NGramTokenFilter missing in trunk

2007-07-13 Thread Thomas Peuss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Peuss reopened SOLR-298:
---


Sorry. I have not really stated that this issue is for Solr. In Solr-trunk I 
don' find the ngram filters:

[EMAIL PROTECTED] /cygdrive/c/Projects/solr-trunk2
$ grep -ril ngramfilter *

[EMAIL PROTECTED] /cygdrive/c/Projects/solr-trunk2
$

This was a fresh checkout.

 NGramTokenFilter missing in trunk
 -

 Key: SOLR-298
 URL: https://issues.apache.org/jira/browse/SOLR-298
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor

 In one of the patches for SOLR-81 are Ngram TokenFilters. Only the Tokenizers 
 seem to have made it into Subversion (trunk). What happened to them?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



nightly builds / solrj-lib

2007-07-13 Thread Ryan McKinley

I just took a look at the files contained in:
http://people.apache.org/builds/lucene/solr/nightly/

the dist directory does not include the .jar files needed for solrj. 
Can we modify the script to include 'solrj-lib'?


ryan


[jira] Updated: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-139:
---

Attachment: SOLR-269+139-ModifiableDocumentUpdateProcessor.patch

implements modifiable documents in the SOLR-269 update processor chain.

If the request does not have a 'mode' string, the 
ModifyDocumentProcessorFactory does not add a processor to the chain.



 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-07-13 Thread Will Johnson
comments?

Hooray, and very cool.  I didn't know you only needed a locking
mechanism if you only have multiple index writers so the use of NoLock
by default makes perfect sense.

A quick stability update: Since I first submitted the patch ~2 months
ago we've had 0 lockups with it running in all our test environments.  

- will


[jira] Resolved: (SOLR-298) NGramTokenFilter missing in trunk

2007-07-13 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-298.
---

Resolution: Fixed

Thomas - not everything that was in SOLR-81 earlier was committed to Solr.  
Some was committed to Lucene in LUCENE-759.


 NGramTokenFilter missing in trunk
 -

 Key: SOLR-298
 URL: https://issues.apache.org/jira/browse/SOLR-298
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Thomas Peuss
Priority: Minor

 In one of the patches for SOLR-81 are Ngram TokenFilters. Only the Tokenizers 
 seem to have made it into Subversion (trunk). What happened to them?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-07-13 Thread Yonik Seeley

On 7/13/07, Will Johnson [EMAIL PROTECTED] wrote:

Hooray, and very cool.  I didn't know you only needed a locking
mechanism if you only have multiple index writers so the use of NoLock
by default makes perfect sense.


For Lucene, you do (did.. before lockless commits pach) need locking
(a read lock) even to open an index with a reader.  The write lock is
still also needed to avoid a reader changing the index via deletion at
the same time a writer is.  Solr coordinates this at a higher level,
hence it's not really needed.

-Yonik


Re: Rich Docs Indexing

2007-07-13 Thread Erik Hatcher


On Jul 13, 2007, at 10:31 AM, Eric Pugh wrote:
I wanted to see if I could get some momentum going on seeing if  
this is something that the committers want in Solr 1.3...   I'd  
like to write up a wiki page similar to http://wiki.apache.org/solr/ 
UpdateCSV page that would give folks a chance to see what this code  
can do, but highlight that it is a wiki page about just a patch  
file?  Would this be okay, or misleading to folks?


Eric - kudos!  Thanks for this contribution and effort to document  
it.  There is already precedent here - the Field Collapsing  
contribution has worked thus far too:


http://wiki.apache.org/solr/FieldCollapsing

So go for it!

	Erik, who will one day look at this contribution, but not for a few  
weeks, sorry




[jira] Commented: (SOLR-269) UpdateRequestProcessorFactory - process requests before submitting them

2007-07-13 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512502
 ] 

Yonik Seeley commented on SOLR-269:
---

 How do you all feel about the basic structure?
It's a go!
It will get more complicated, I think, with document modification (SOLR-139)

 While it would be nice to keep the base stuff package protected,

I'm more concerned with the other parts of the API that this moves 
front-and-center... 
mainly UpdateCommand and friends... those were really quick hacks on my part 
since there were no custom update handlers at the time.

 One clever change is to have the LogUpdateProcessorFactory skip building a 
 LogUpdateProcessor if the log level is not INFO rather then keep a flag.

Nice!

I also need SOLR-139 btw, is it easy for you to commit this first to limit the 
size and scope of that patch?

 UpdateRequestProcessorFactory - process requests before submitting them
 ---

 Key: SOLR-269
 URL: https://issues.apache.org/jira/browse/SOLR-269
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 1.3

 Attachments: SOLR-269-UpdateRequestProcessorFactory.patch, 
 SOLR-269-UpdateRequestProcessorFactory.patch, 
 SOLR-269-UpdateRequestProcessorFactory.patch, UpdateProcessor.patch


 A simple UpdateRequestProcessor was added to a bloated SOLR-133 commit. 
 An UpdateRequestProcessor lets clients plug in logic after a document has 
 been parsed and before it has been 'updated' with the index.  This is a good 
 place to add custom logic for:
  * transforming the document fields
  * fine grained authorization (can user X updated document Y?)
  * allow update, but not delete (by query?)
requestHandler name=/update class=solr.StaxUpdateRequestHandler 
  str 
 name=update.processor.classorg.apache.solr.handler.UpdateRequestProcessor/str
  lst name=update.processor.args
   ... (optionally pass in arguments to the factory init method) ...
  /lst 
/requestHandler
 http://www.nabble.com/Re%3A-svn-commit%3A-r547495---in--lucene-solr-trunk%3A-example-solr-conf-solrconfig.xml-src-java-org-apache-solr-handler-StaxUpdateRequestHandler.java-src-java-org-apache-solr-handler-UpdateRequestProcessor.jav-tf3950072.html#a11206583

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Rich Docs Indexing

2007-07-13 Thread Eric Pugh

Hi all,

I've been working with the RichDocumentRequestHandler (http:// 
issues.apache.org/jira/browse/SOLR-284)  for the past weeks, and it  
seems to be working quite well.  We discovered that when we throw a  
27 MB PDF document at it we needed to beef up the Java Heap size, and  
we haven't come up with a great solution for handling PDF documents  
that have a password on them, beyond not indexing them.


I wanted to see if I could get some momentum going on seeing if this  
is something that the committers want in Solr 1.3...   I'd like to  
write up a wiki page similar to http://wiki.apache.org/solr/UpdateCSV  
page that would give folks a chance to see what this code can do, but  
highlight that it is a wiki page about just a patch file?  Would this  
be okay, or misleading to folks?


I've updated the patch to revision 555996.

Thanks for your consideration!   PS, is anyone going to be at OSCON  
in two weeks?  I'd love to meet up with some other Solr folks.


Eric

---
Principal
OpenSource Connections
Site: http://www.opensourceconnections.com
Blog: http://blog.opensourceconnections.com
Cell: 1-434-466-1467






[jira] Updated: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-07-13 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-240:
--

Attachment: IndexWriter2.patch

good point about recommending 'single' in the event of concurrency bugs.

i've never really looked at the internals of the LockFactories so i'm going to 
punt on the subclass idea for now (i like it i just don't have time to do it) 
but we can always redefine single later.  (i'll open another bug if we're 
okay with committing this new patch as is)

revised patch just changes the wording and suggested value in solrconfig.xml


objections?

 java.io.IOException: Lock obtain timed out: SimpleFSLock
 

 Key: SOLR-240
 URL: https://issues.apache.org/jira/browse/SOLR-240
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.2
 Environment: windows xp
Reporter: Will Johnson
 Attachments: IndexWriter.patch, IndexWriter2.patch, 
 IndexWriter2.patch, IndexWriter2.patch, stacktrace.txt, ThrashIndex.java


 when running the soon to be attached sample application against solr it will 
 eventually die.  this same error has happened on both windows and rh4 linux.  
 the app is just submitting docs with an id in batches of 10, performing a 
 commit then repeating over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512558
 ] 

Ryan McKinley commented on SOLR-139:


 the udpate handler knows much more about the index than we do outside

Yes.  The patch i just attached only deals with documents that are already 
commited.  It uses req.getSearcher() to find existing documents.  

Beyond finding commited or non-commited Documents, is there anything else that 
it can do better?  

Is it enought to add something to UpdateHandler to ask for a pending or 
commited document by uniqueId?

I like having the the actual document manipulation happening in the Processor 
because it is an easy place to put in other things like grabbing stuff from a 
SQL database.  

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512553
 ] 

Yonik Seeley commented on SOLR-139:
---

Some general issues w/ update processors and modifiable documents, and keeping 
this stuff out of the update handler is that the udpate handler knows much more 
about the index than we do outside, and it constrains implementation (and 
performance optimizations).

For example, if modifiable documents were implemented in the update handler, 
and the old version of the document hasn't been committed yet, the update 
handler could buffer the complete modify command to be done at a later time 
(the *much* slower alternative is closing the writer and opening the reader to 
get the latest stored fields), then closing the reader and re-opening the 
writer.


 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-07-13 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512554
 ] 

Yonik Seeley commented on SOLR-240:
---

No objections... a hang (in the event of bugs) will suffice for now.

 java.io.IOException: Lock obtain timed out: SimpleFSLock
 

 Key: SOLR-240
 URL: https://issues.apache.org/jira/browse/SOLR-240
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.2
 Environment: windows xp
Reporter: Will Johnson
 Attachments: IndexWriter.patch, IndexWriter2.patch, 
 IndexWriter2.patch, IndexWriter2.patch, stacktrace.txt, ThrashIndex.java


 when running the soon to be attached sample application against solr it will 
 eventually die.  this same error has happened on both windows and rh4 linux.  
 the app is just submitting docs with an id in batches of 10, performing a 
 commit then repeating over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-139:
---

Attachment: SOLR-139-ModifyInputDocuments.patch

Updated patch to work with SOLR-269 UpdateRequestProcessors.

One thing I think is weird about this is that it uses parameters to say the 
mode rather then the add command.  That is, to modify a documetn you have to 
send:

/update?mode=OVERWRITE,count:INCREMENT
add
 doc
  field name=id1/field
  field name=count5/field
 /doc
/add

rather then:
add mode=OVERWRITE,count:INCREMENT
 doc
  field name=id1/field
  field name=count5/field
 /doc
/add

This is fine, but it makes it hard to have an example 'modify' xml document.

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-258) Date based Facets

2007-07-13 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512559
 ] 

Hoss Man commented on SOLR-258:
---


1) i'm happy to break out the FacetParams into their own interface ... but i'd 
like to track that in a separate refactoring commit (since the existing facet 
params are already in SolrParams)

2) i clearly anticipated the FacetDateOther.get( bogus ) problem .. but for 
some reason i thought it returned null ... i'll fix that.

3) i actually considered before, between, and after originally but decided they 
were too long (i was trying to find a way to make start shorter as well ... 
but two people thinking there better convinces me.

4) my hesitation about renaming gap to interval is that i wanted to leave 
the door open for a sperate interval option (to define a gap between the 
gaps so to speak) later should it be desired ... see the questions i listed 
when opening the bug.

5) i don't think this code makes sense for non-linear intervals ... the problem 
i'm really trying to solve here is using 3 params to express equal date 
divisions across an arbitrarily long time scale.   for the example you listed 
simple facet.query options probably make more sense

(allthough you do have me now thinking that a another good faceting option 
would be some new facet.range where many values can be specified, they all 
get sorted and then ranges are built between each successive value ... bt that 
should be a seperate issue)

6) i want to make start and end optional, but for now i can't think of a 
clean/fast way to do end ... and we can always add defaults later.

7) my prefrence is for every count to cover a range of exactly gap but i can 
definitely see where having a hard cutoff of end is usefull, so i'll make it 
an option ... name suggestions?

i'll make sure to echo the value of end as well so it's easy to build filter 
queries for that last range ... probably should have it anyway to build filter 
queries on between and after.

should the ranges used to compute the between and after counts depend on where 
the last range ended or on the literal end param?

8) the NOW variance really bugs me ... back when i built DateMathParser i 
anticipated this by making the parser have a fixed concept of NOW which could 
be used to parse multiple strings but i don't kow why i didn't consider it when 
working on this new patch.
the real problem is that right now DateField is relied on to do all hte 
parsing, and a single instance can't have a fixed notion of NOW ... it builds 
a new DateMathParser each time ... i think i'm going ot have to do some heavily 
refactoring to fix this, which is annoying -- but i don't want to commit 
without fixing this, even if it takes a while any bug that can produce an off 
by 1 millisecond discrepancy should die a horrible horrible freaking death.



 Date based Facets
 -

 Key: SOLR-258
 URL: https://issues.apache.org/jira/browse/SOLR-258
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: date_facets.patch, date_facets.patch, date_facets.patch, 
 date_facets.patch, date_facets.patch


 1) Allow clients to express concepts like...
 * give me facet counts per day for every day this month.
 * give me facet counts per hour for every hour of today.
 * give me facet counts per hour for every hour of a specific day.
 * give me facet counts per hour for every hour of a specific day and 
 give me facet counts for the 
number of matches before that day, or after that day. 
 2) Return all data in a way that makes it easy to use to build filter queries 
 on those date ranges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-258) Date based Facets

2007-07-13 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512571
 ] 

Ryan McKinley commented on SOLR-258:


 
 but i'd like to track that in a separate refactoring commit (since the 
 existing facet params are already in SolrParams)

sounds good.

 ... originally but decided they were too long ..

In general, I favor longer self explanatory param names over short ones.It 
is kind of annoying to have to look up 'pf', 'bq' to decode what it means.  

- - -

Again, this is really great.  Now we can build the ubiquitous calendar widget 
from solr.

Thanks!

 Date based Facets
 -

 Key: SOLR-258
 URL: https://issues.apache.org/jira/browse/SOLR-258
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: date_facets.patch, date_facets.patch, date_facets.patch, 
 date_facets.patch, date_facets.patch


 1) Allow clients to express concepts like...
 * give me facet counts per day for every day this month.
 * give me facet counts per hour for every hour of today.
 * give me facet counts per hour for every hour of a specific day.
 * give me facet counts per hour for every hour of a specific day and 
 give me facet counts for the 
number of matches before that day, or after that day. 
 2) Return all data in a way that makes it easy to use to build filter queries 
 on those date ranges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Reopened: (SOLR-298) NGramTokenFilter missing in trunk

2007-07-13 Thread Mike Klaas

On 13-Jul-07, at 12:48 AM, Thomas Peuss (JIRA) wrote:



 [ https://issues.apache.org/jira/browse/SOLR-298? 
page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]


Thomas Peuss reopened SOLR-298:
---


Sorry. I have not really stated that this issue is for Solr. In  
Solr-trunk I don' find the ngram filters:


[EMAIL PROTECTED] /cygdrive/c/Projects/solr-trunk2
$ grep -ril ngramfilter *

[EMAIL PROTECTED] /cygdrive/c/Projects/solr-trunk2
$

This was a fresh checkout.


Solr includes these analyzers as a lucene jar, not source.

-Mike 


[jira] Resolved: (SOLR-240) java.io.IOException: Lock obtain timed out: SimpleFSLock

2007-07-13 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-240.
---

   Resolution: Fixed
Fix Version/s: 1.3
 Assignee: Hoss Man

Committed revision 556099.


 java.io.IOException: Lock obtain timed out: SimpleFSLock
 

 Key: SOLR-240
 URL: https://issues.apache.org/jira/browse/SOLR-240
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.2
 Environment: windows xp
Reporter: Will Johnson
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: IndexWriter.patch, IndexWriter2.patch, 
 IndexWriter2.patch, IndexWriter2.patch, stacktrace.txt, ThrashIndex.java


 when running the soon to be attached sample application against solr it will 
 eventually die.  this same error has happened on both windows and rh4 linux.  
 the app is just submitting docs with an id in batches of 10, performing a 
 commit then repeating over and over again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-300) create subclass of SingleInstanceLockFactory which warns loadly in the event of concurrent lock attempts

2007-07-13 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-300:
--

Component/s: update
   Priority: Minor  (was: Major)
 Issue Type: Wish  (was: Improvement)

 create subclass of SingleInstanceLockFactory which warns loadly in the event 
 of concurrent lock attempts
 

 Key: SOLR-300
 URL: https://issues.apache.org/jira/browse/SOLR-300
 Project: Solr
  Issue Type: Wish
  Components: update
Reporter: Hoss Man
Priority: Minor

 as noted by yonik in SOLR-240...
 How about SingleInstanceLockFactory to aid in catching concurrency bugs?
   ...
 or even better, a subclass or other implementation: 
 SingleInstanceWarnLockFactory or SingleInstanceCoordinatedLockFactory that 
 log a failure if obtain() is called for a lock that is already locked.
 we should create a new subclass like Yonik describes and change 
 SolrIndexWriter to use this subclass if/when single is specified as the 
 lockType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-287) set commitMaxTime when adding a document

2007-07-13 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-287:
---

Attachment: SOLR-287-AddCommitMaxTime.patch

No real changes - updated to work with trunk.

Without objection, I think this should be added soon...

 set commitMaxTime when adding a document
 

 Key: SOLR-287
 URL: https://issues.apache.org/jira/browse/SOLR-287
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-287-AddCommitMaxTime.patch, 
 SOLR-287-AddCommitMaxTime.patch


 Rather then setting a global autoCommit maxTime, it would be nice to set a 
 maximum time for a single add command.  This patch adds:
 add commitMaxTime=1000
   ...
 /add
 to add the document within 1 sec.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-248) Capitalization Filter Factory

2007-07-13 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-248:
---

Attachment: SOLR-248-CapitalizationFilter.patch

1. Added better javadocs explaining the configuration.
2. removed synchronized map
3. put the Filter as a package private class in the Factory file -- since the 
filter relies on hte factory, it is not particularly useful outsid solr.

I would like to add this soon


 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch, 
 SOLR-248-CapitalizationFilter.patch, SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-248) Capitalization Filter Factory

2007-07-13 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley reassigned SOLR-248:
--

Assignee: Ryan McKinley

 Capitalization Filter Factory
 -

 Key: SOLR-248
 URL: https://issues.apache.org/jira/browse/SOLR-248
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Minor
 Attachments: SOLR-248-CapitalizationFilter.patch, 
 SOLR-248-CapitalizationFilter.patch, SOLR-248-CapitalizationFilter.patch


 For tokens that are used in faceting, it is nice to have standard 
 capitalization.  
 I want Aerial views and Aerial Views to both be: Aerial Views

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512613
 ] 

Ryan McKinley commented on SOLR-139:


 
 The update handler could call the processor when it was time to do the 
 manipulation too.
 

What are you thinking?  Adding the processor as a parameter to AddUpdateCommand?

 ... ParallelReader, where some fields are in one sub-index  ...

the processor would ask the updateHandler for the existing document - the 
updateHandler deals with getting it to/from the right place.

we could add something like:
  Document getDocumentFromPendingOrCommited( String indexId )
to UpdateHandler and then that is taken care of.

Other then extracting the old document, what needs to be done that cant be done 
in the processor?  

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512617
 ] 

Yonik Seeley commented on SOLR-139:
---

 ... ParallelReader, where some fields are in one sub-index ...
 the processor would ask the updateHandler for the existing document - the 
 updateHandler deals with
 getting it to/from the right place.

The big reason you would use ParallelReader is to avoid touching the 
less-modified/bigger fields in one index when changing some of the other fields 
in the other index.

 What are you thinking? Adding the processor as a parameter to 
 AddUpdateCommand?

I didn't have a clear alternative... I was just pointing out the future 
pitfalls of assuming too much implementation knowledge.



 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-258) Date based Facets

2007-07-13 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512626
 ] 

Hoss Man commented on SOLR-258:
---

 the big problem being that I doubt the SolrQueryRequest is always available 
 everywhere it's needed. 

...exactly, at the moment all of the date parsing is done inside DateField.

i think i'll try refactoring it so that DateMathParser does *all* the parsing, 
and make DateField delegate to it in the non-trivial case.

the problem that's still a pain to solve is getting all concepts of NOW to be 
the samefor a request ... things like an fq=f:[NOW * NOW+1DAY]  are handled by 
DateField via a query parser ... i can't think of easy way to make that 
consistent with the facet parsing definition of NOW (without resorting to a 
ThreadLocal)

 Date based Facets
 -

 Key: SOLR-258
 URL: https://issues.apache.org/jira/browse/SOLR-258
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: date_facets.patch, date_facets.patch, date_facets.patch, 
 date_facets.patch, date_facets.patch


 1) Allow clients to express concepts like...
 * give me facet counts per day for every day this month.
 * give me facet counts per hour for every hour of today.
 * give me facet counts per hour for every hour of a specific day.
 * give me facet counts per hour for every hour of a specific day and 
 give me facet counts for the 
number of matches before that day, or after that day. 
 2) Return all data in a way that makes it easy to use to build filter queries 
 on those date ranges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512628
 ] 

Ryan McKinley commented on SOLR-139:


 
 ... avoid touching the less-modified/bigger fields ...
 

aaah, perhaps a future updateHandler getDocument() function could take a list 
of fields it should extract.  Still problems with what to do when you add it.. 
maybe it checks if anything has changed in the less-modified index?  I see your 
point.

 What are you thinking? Adding the processor as a parameter to 
 AddUpdateCommand?
 
 I didn't have a clear alternative... I was just pointing out the future 
 pitfalls of assuming too much implementation knowledge.
 

I am fine either way -- in the UpdateHandler or the Processors.  

Request plumbing-wise, it feels the most natural in a processor.  But if we 
rework the AddUpdateCommand it could fit there too.  I don't know if it is an 
advantage or disadvantage to have the 'modify' parameters tied to the command 
or the parameters.  either way has its +-, with no real winner (or loser) IMO

In the end, I want to make sure that I never need a custom UpdateHandler (80% 
is greek to me), but can easily change the 'modify' logic.

 Support updateable/modifiable documents
 ---

 Key: SOLR-139
 URL: https://issues.apache.org/jira/browse/SOLR-139
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Attachments: SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
 SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
 SOLR-139-XmlUpdater.patch, 
 SOLR-269+139-ModifiableDocumentUpdateProcessor.patch


 It would be nice to be able to update some fields on a document without 
 having to insert the entire document.
 Given the way lucene is structured, (for now) one can only modify stored 
 fields.
 While we are at it, we can support incrementing an existing value - I think 
 this only makes sense for numbers.
 for background, see:
 http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Mike Klaas


On 13-Jul-07, at 1:53 PM, Yonik Seeley (JIRA) wrote:




... ParallelReader, where some fields are in one sub-index ...
the processor would ask the updateHandler for the existing  
document - the updateHandler deals with

getting it to/from the right place.


The big reason you would use ParallelReader is to avoid touching  
the less-modified/bigger fields in one index when changing some of  
the other fields in the other index.


I've pondered this a few times: it could be a huge win for  
highlighting apps, which can be stored-field-heavy.


However, I wonder if there is something that I am missing: PR  
requires perfect synchro of lucene doc ids, no?  If you update fields  
for a doc in one index, need not you (re-)store the fields in all  
other indices too, to keep the doc ids in sync?


-mike


Re: [jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Yonik Seeley

On 7/13/07, Mike Klaas [EMAIL PROTECTED] wrote:

 ... ParallelReader, where some fields are in one sub-index ...
 the processor would ask the updateHandler for the existing
 document - the updateHandler deals with
 getting it to/from the right place.

 The big reason you would use ParallelReader is to avoid touching
 the less-modified/bigger fields in one index when changing some of
 the other fields in the other index.

I've pondered this a few times: it could be a huge win for
highlighting apps, which can be stored-field-heavy.

However, I wonder if there is something that I am missing: PR
requires perfect synchro of lucene doc ids, no?  If you update fields
for a doc in one index, need not you (re-)store the fields in all
other indices too, to keep the doc ids in sync?


Well, it would be tricky... one PR usecase would be to entirely
re-index one field (in it's own separate index) thus maintaining
synchronization with the main index. As Doug said
ParallelReader was not really designed to support incremental updates of
fields, but rather to accellerate batch updates. For incremental
updates you're probably better served by updating a single index.

That's probably not too useful for a general purpose platform like Solr.

Another way to support a more incremental model is perhaps to split up
the smaller volatile index into many segments so that updating a
single doc involves rewriting just that segment.

There might also be possibilities in different types of IndexReader
implementations:  one could map docids to maintain synchronization.
This brings up a slightly different problem that lucene scorers expect
to go in docid order.

-Yonik


Re: nightly builds / solrj-lib

2007-07-13 Thread Chris Hostetter

: http://people.apache.org/builds/lucene/solr/nightly/
:
: the dist directory does not include the .jar files needed for solrj.
: Can we modify the script to include 'solrj-lib'?

if it's not in the nightly releases, then they won't make it into the
official releases either -- the nightly.sh just renames the standard
tgz/zip release files.

it looks like the problem is that the tarfileset used by the package
target only includes jar and war files from dist (not subdirs) ... i dont'
see any reason why it shouldn't include dist/* ... so try changing thta
and see if the artifacts from ant package start including the solrj
stuff.



-Hoss