[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Bertrand Delacretaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488413
 ] 

Bertrand Delacretaz commented on SOLR-207:
--

IIUC the snapshot directories are named like

  snapshot.MMDDHHMMSS

and they are all under the same parent directory.

If that's the case, then doing

  ls -rt ${data_dir}/snapshot.* | head -1

will return the name of the most recent directory, efficiently.


 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488418
 ] 

Yonik Seeley commented on SOLR-207:
---

That's close to the way it was done in the past, but some people ran into 
problems because of shell restrictions w.r.t. number or size of the argments 
passed to the process (because the shell expands the list).

 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488420
 ] 

Yonik Seeley commented on SOLR-207:
---

Although, another alternative that doesn't have the shell expansion problem 
would be

ls -r ${data_dir} | grep snapshot\\.  | grep -v wip | head -1



 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Bertrand Delacretaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488468
 ] 

Bertrand Delacretaz commented on SOLR-207:
--

I think find -maxdepth is not supported on Solaris. And the -t option in my 
previous example was obviously wrong.

I'm not sure if ls -r sorts by filename everywhere (but I have no evidence that 
it does not).

The most portable version might be

  ls ${data_dir} | grep snapshot\\. | grep -v wip | sort -r | head -1 

 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488469
 ] 

Yonik Seeley commented on SOLR-207:
---

I tried both versions out, and the find version was quicker (on Linux at 
least).
System time was about the same, but ls had much higher user time.

$ time find . -maxdepth 1 -name 'snapshot.*' | grep -v wip | head -1
./snapshot.20070411235957

real0m0.009s
user0m0.002s
sys 0m0.008s

$ time ls -r . | grep snapshot\\. | grep -v wip | head -1
snapshot.20070412114504

real0m0.050s
user0m0.043s
sys 0m0.009s



 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-207) snappuller inefficient finding latest snapshot

2007-04-12 Thread Bill Au (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488491
 ] 

Bill Au commented on SOLR-207:
--

I confirmed that find -maxdepth does not work on Solaris.  So it is back to ls. 
 We should be OK as long as we don't use any wildcard that causes expansion.

 snappuller inefficient finding latest snapshot
 --

 Key: SOLR-207
 URL: https://issues.apache.org/jira/browse/SOLR-207
 Project: Solr
  Issue Type: Bug
  Components: replication
Reporter: Yonik Seeley
 Attachments: find_maxdepth.patch


 snapinstaller (and snappuller) do the following to find the latest snapshot:
 name=`find ${data_dir} -name snapshot.* -print|grep -v wip|sort -r|head -1`
 This recurses into all of the snapshot directories, doing much more disk-io 
 than is necessary.
 I think it is the cause of bloated kernel memory usage we have seen on some 
 of our Linux boxes, caused
 by kernel dentry and inode caches.   Those caches compete with buffer cache 
 (caching the actual data of the index)
 and can thus decrease performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.