Re: [Solr Wiki] Trivial Update of SolrRelevancyFAQ by YonikSeeley

2007-04-12 Thread Chris Hostetter

: New page:
: = Solr Relevancy FAQ =

Should we merge this info with SolrRelevancyCookbook?

(is the info on that page of any use to anyone?)

http://wiki.apache.org/solr/SolrRelevancyCookbook



-Hoss



Re: [Solr Wiki] Trivial Update of SolrRelevancyFAQ by YonikSeeley

2007-04-12 Thread Yonik Seeley

On 4/12/07, Chris Hostetter [EMAIL PROTECTED] wrote:

: New page:
: = Solr Relevancy FAQ =

Should we merge this info with SolrRelevancyCookbook?

(is the info on that page of any use to anyone?)

http://wiki.apache.org/solr/SolrRelevancyCookbook


I need to work on it more, then I was going to replace that page.

-Yonik


[jira] Created: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)
snappuller inefficient finding latest snapshot
--

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley


snapinstaller (and snappuller) do the following to find the latest snapshot:
name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`

This recurses into all of the snapshot directories, doing much more disk-io 
than is necessary.
I think it is the cause of bloated kernel memory usage we have seen on some of 
our Linux boxes, caused
by kernel dentry and inode caches.   Those caches compete with buffer cache 
(caching the actual data of the index)
and can thus decrease performance.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-207:
--

Attachment: find_maxdepth.patch

uses -maxdepth 1 to avoid recursion.

Bill - does this look OK?

 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Bertrand Delacretaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488413
 ] 

Bertrand Delacretaz commented on SOLR-207:
--

IIUC the snapshot directories are named like

  snapshot.MMDDHHMMSS

and they are all under the same parent directory.

If that's the case, then doing

  ls -rt ${data_dir}/snapshot.* | head -1

will return the name of the most recent directory, efficiently.


 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488418
 ] 

Yonik Seeley commented on SOLR-207:
---

That's close to the way it was done in the past, but some people ran into 
problems because of shell restrictions w.r.t. number or size of the argments 
passed to the process (because the shell expands the list).

 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-208) RSS feed XSL example

2007-04-12 Thread Brian Whitman (JIRA)
RSS feed XSL example


 Key: SOLR-208
 URL: https://issues.apache.org/jira/browse/SOLR-208
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.2
Reporter: Brian Whitman
Priority: Trivial
 Attachments: rss.xsl

A quick .xsl file for transforming solr queries into RSS feeds. To get the date 
and time in properly you'll need an XSL 2.0 processor, as in 
http://wiki.apache.org/solr/XsltResponseWriter .  Tested to work with the 
example solr distribution in the nightly.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-208) RSS feed XSL example

2007-04-12 Thread Brian Whitman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Whitman updated SOLR-208:
---

Attachment: rss.xsl

Attaching the rss.xsl file -- just put this in solr/conf/xslt/ and then try

http://localhost:8983/solr/select?q=ipodwt=xslttr=rss.xsl



 RSS feed XSL example
 

 Key: SOLR-208
 URL: https://issues.apache.org/jira/browse/SOLR-208
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.2
Reporter: Brian Whitman
Priority: Trivial
 Attachments: rss.xsl


 A quick .xsl file for transforming solr queries into RSS feeds. To get the 
 date and time in properly you'll need an XSL 2.0 processor, as in 
 http://wiki.apache.org/solr/XsltResponseWriter .  Tested to work with the 
 example solr distribution in the nightly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488420
 ] 

Yonik Seeley commented on SOLR-207:
---

Although, another alternative that doesn't have the shell expansion problem 
would be

ls -r ${data_dir} | grep snapshot\\.  | grep -v wip | head -1



 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Bertrand Delacretaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488468
 ] 

Bertrand Delacretaz commented on SOLR-207:
--

I think find -maxdepth is not supported on Solaris. And the -t option in my 
previous example was obviously wrong.

I'm not sure if ls -r sorts by filename everywhere (but I have no evidence that 
it does not).

The most portable version might be

  ls ${data_dir} | grep snapshot\\. | grep -v wip | sort -r | head -1 

 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488469
 ] 

Yonik Seeley commented on SOLR-207:
---

I tried both versions out, and the find version was quicker (on Linux at 
least).
System time was about the same, but ls had much higher user time.

$ time find . -maxdepth 1 -name 'snapshot.*' | grep -v wip | head -1
./snapshot.20070411235957

real0m0.009s
user0m0.002s
sys 0m0.008s

$ time ls -r . | grep snapshot\\. | grep -v wip | head -1
snapshot.20070412114504

real0m0.050s
user0m0.043s
sys 0m0.009s



 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Bill Au (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488491
 ] 

Bill Au commented on SOLR-207:
--

I confirmed that find -maxdepth does not work on Solaris.  So it is back to ls. 
 We should be OK as long as we don't use any wildcard that causes expansion.

 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-207:
--

Attachment: find_maxdepth.patch

Updated patch:
- switches back to ls,
- tries to determine if maxdepth is supported for the cleanup scripts that 
need to find -mtime
- in snappuller, make the master find the latest snapshot instead of sending 
the complete ls across the network.

This has not yet been tested.

 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch, find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-207:
--

Attachment: find_maxdepth.patch

re-attaching with ASF perms (in the older JIRA version, the grant license 
option was first, and now it is last... hence I keep clicking the incorrect one)

 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch, find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-207:
--

Attachment: (was: find_maxdepth.patch)

 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch, find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.