Re: How To Do It Faster?!?

2005-04-02 Thread Simo Melenius
[EMAIL PROTECTED] writes:

> >$ find . -type f -printf "%T@ %u %s %p\n" > /yourserverroot/files.txt
> That is a nice idea. I don't know very much about Unix, but I suppose that
> on a ksh I can run this command (or a similar one) in order to obtain the
> list I need. If anyone knows if that command will run also on a simple ksh,
> could please confirm that?

That depends on the unix flavor you're using -- my example was for GNU
utilities which are heavily used on (probably all) Linux systems.
BSDs, Solaris and other unixen have slightly different 'find' syntax.
Use "man find" to find out more about how 'find' works on your system.

On all systems I know, it goes like:

find  [-switches ...]

where the switches vary depending on the system. In my example, I used
-type f (f as in file) to only list files (otherwise 'find' will
include directories too in the output) and -printf to include desired
data -- in this case, owner, last modified time, size, path -- in the
output (otherwise 'find' will only print the path).

You should at least go through the -printf formatting codes to see
what information you're able to include in the output (=> man find).

I used %T@ to print the last modified time in Unix time because it's
as simple as it can be: an integer, counting the number of seconds
since Jan 1 1970. Python's "time" module groks Unix time just like
that.

> Moreover, I could run this script in a while loop, like:

Except that, I'd imagine, constantly traversing the filesystem will
seriously degrade the performance of the file server. You want to run
your script periodically over a day, maybe at times when the server is
inactive. Or hourly between 8am-4pm and then once at night.

In Unix, there's a facility called cron to do just that, it runs
scripts and commands over and over again hourly, daily, weekly, or
just whenever you want it. Consult your unix flavor's manual or
newsgroup on that.

> copy /yourserverroot/files.txt   /yourserverroot/filesbackup.txt
> always have the filesbackup.txt up-to-date, as a function of the "find"
> speed on the server.

Yes, creating a temporary file is a good approach. I'd suggest moving
the new list over the old one (mv tmpfile filelist.txt) instead of
copying, since usually move is merely a rename operation on the
filesystem and doesn't involve actually copying of any data.


br,
S

-- 
[EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


How To Do It Faster?!?

2005-04-02 Thread andrea_gavana
Hello Simo & NG,

>Correct me if I'm wrong but since it _seems_ that the listing doesn't
>need to be up-to-date each minute/hour as the users will be looking
>primarily for old/unused files, why not have a daily cronjob on the
>Unix server to produce an appropriate file list on e.g. the root
>directory of your file server?

You are correct. I don't need this list to be updated every minute/hour.

>$ find . -type f -printf "%T@ %u %s %p\n" > /yourserverroot/files.txt

That is a nice idea. I don't know very much about Unix, but I suppose that
on a ksh I can run this command (or a similar one) in order to obtain the
list I need. If anyone knows if that command will run also on a simple ksh,
could please confirm that?

Moreover, I could run this script in a while loop, like:

while 1:
do

if -e [/yourserverroot/filesbackup.txt];
then
find . -type f -printf "%T@ %u %s %p\n" > /yourserverroot/files.txt
copy /yourserverroot/files.txt   /yourserverroot/filesbackup.txt
else
find . -type f -printf "%T@ %u %s %p\n" > 
/yourserverroot/filesbackup.txt
fi

done

or something similar (I don't have Unix at hand now, I can not test the
commands and, as I said, I don't know Unix very well...). In this way, I
always have the filesbackup.txt up-to-date, as a function of the "find"
speed on the server.
Then my GUI could scan the filesbackup.txt file and search for a particular
user information.

Thanks to all the NG for your suggestions!

Andrea.

--
http://mail.python.org/mailman/listinfo/python-list


Re: How To Do It Faster?!?

2005-04-01 Thread Simo Melenius
[EMAIL PROTECTED] writes:

> Every user of thsi big directory works on big studies regarding oil
> fields. Knowing the amount of data (and number of files) we have to
> deal with (produced by simulators, visualization tools, and so on)
> and knowing that users are usually lazy in doing clean up of
> unused/old files, this is a way for one of us to "fast" scan all the
> directories and identify which files belong to him. Having them in
> an organized, size-sorted wxPython list, the user can decide if he
> want to delete some files (that almost surely he forgot even that
> they exist...) or not. It is easy as a button click (retrieve the
> data-->delete the files).

Correct me if I'm wrong but since it _seems_ that the listing doesn't
need to be up-to-date each minute/hour as the users will be looking
primarily for old/unused files, why not have a daily cronjob on the
Unix server to produce an appropriate file list on e.g. the root
directory of your file server?

Your Python client would then load that (possibly compressed) text
file from the network share and find the needed bits in there.

Note that if some "old/unneeded" files are missing today, they'll show
right up the following day.

For example, running the GNU find command like this:

$ find . -type f -printf "%T@ %u %s %p\n" > /yourserverroot/files.txt

produces a file where each line contains the last modified time,
username, size and path for one file. Dead easy to parse with Python,
and you'll only have to set up the cronjob _once_ on the Unix server.

(If the file becomes too big, grep can be additionally used to split
the file e.g. per each user.)


br,
S

-- 
[EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


FAM and Python? (was Re: How To Do It Faster?!?)

2005-04-01 Thread Jeremy Bowers
On Sat, 02 Apr 2005 02:02:31 +0200, andrea_gavana wrote:

> Hello Jeremy & NG,
> Every user of thsi big directory works on big studies regarding oil fields.
> Knowing the amount of data (and number of files) we have to deal with 
> (produced
> by simulators, visualization tools, and so on) and knowing that users are
> usually lazy in doing clean up of unused/old files, this is a way for one
> of us to "fast" scan all the directories and identify which files belong
> to him. Having them in an organized, size-sorted wxPython list, the user
> can decide if he want to delete some files (that almost surely he forgot
> even that they exist...) or not. It is easy as a button click (retrieve
> the data-->delete the files).

Got it. A good idea!

>>Here's an idea to sort of come at the problem from a different angle. Can
>>you run something on the file server itself, and use RPC to access it?
> 
> I don't even know what is RPC... I have to look at it.

RPC stands for "remote procedure call". The idea is that you do something
that looks like a normal function call, except it happens on a remote
server. Complexity varies widely.

Given your situation, and if running something on the UNIX server is a
possibility, I'd recommend downloading and playing with Pyro; it is Python
specific, so I think it would be the best thing for you, being powerful,
well integrated with Python, and easy to use.

Then, on your client machine in Windows, ultimately you'd make some sort
of call to your server like

fileList = server.getFileList(user)

and you'd get the file list for that user, returning whatever you want for
your app; a list of tuples, objects, whatever you want. Pyro will add no
constraints to your app.

> I am not sure if my new explanation fits with your last information... as
> above, I didn't even know about fam... I've read a little, but probably
> I am too newbie to see a link between it and my scope. Do you think it exists?
> It would be nice to have something that tracks the file status on all the
> file system, but probably is a LOT of work wrt what my app should be able
> to do.

Maybe, maybe not. I've never used FAM. Perhaps someone who has can chime
in about the ease of use; I've changed the subject to try to attract such
a person. It also depends on if FAM works on your UNIX.

My point is that you can do one scan at startup (can't avoid this), but
then as the file system monitor tells you that a change has occurred, you
update your data structures to account for the change. That way, your data
is always in sync. (For safety's sake, you might set the server to
terminate itself and re-start every night.) Since it's always in sync, you
can send this data back instead of scanning the file system.

At this point, my suggestion would be to consider whether you want to
spend the effort to speed it up like this, which is something only you
(and presumably your managers) are in a position to know, given that you
have an existing tool (at least, you seem to speak like you have a
functional tool). If you do, then I'd take some time and work a bit with
Pyro and FAM, and *then* re-evaluate where you stand. By then you'll
probably be able to ask better questions, too, and like I said above,
perhaps someone will share their experiences with FAM.

Good luck, and have fun; seriously, that's important here.
-- 
http://mail.python.org/mailman/listinfo/python-list


How To Do It Faster?!?

2005-04-01 Thread andrea_gavana
Hello Jeremy & NG,

>Yes, clearer, though I still don't know what you're *doing* with that data
:-)

Every user of thsi big directory works on big studies regarding oil fields.
Knowing the amount of data (and number of files) we have to deal with (produced
by simulators, visualization tools, and so on) and knowing that users are
usually lazy in doing clean up of unused/old files, this is a way for one
of us to "fast" scan all the directories and identify which files belong
to him. Having them in an organized, size-sorted wxPython list, the user
can decide if he want to delete some files (that almost surely he forgot
even that they exist...) or not. It is easy as a button click (retrieve
the data-->delete the files).

>Here's an idea to sort of come at the problem from a different angle. Can
>you run something on the file server itself, and use RPC to access it?

I don't even know what is RPC... I have to look at it.


>The reason I mention this is a lot of UNIXes have an API to detect file
>changes live; for instance, google "python fam". It would be easy to hook
>something up to scan the files at startup and maintain your totals live,
>and then use one of the many extremely easy Python RPC mechanisms to
>request the data as the user wants it, which would most likely come back
>at network speeds (fast).

I am not sure if my new explanation fits with your last information... as
above, I didn't even know about fam... I've read a little, but probably
I am too newbie to see a link between it and my scope. Do you think it exists?
It would be nice to have something that tracks the file status on all the
file system, but probably is a LOT of work wrt what my app should be able
to do.
Anyway, thanks for the hints! If my new explanation changed something, can
anyone post some more comments?

Thanks to you all.

Andrea.

--
http://mail.python.org/mailman/listinfo/python-list


Re: How To Do It Faster?!?

2005-04-01 Thread Jeremy Bowers
On Sat, 02 Apr 2005 01:00:34 +0200, andrea_gavana wrote:

> Hello Jeremy & NG,
> ...
> I hope to have been clearer this time...
> 
> I really welcome all your suggestions.

Yes, clearer, though I still don't know what you're *doing* with that data :-)

Here's an idea to sort of come at the problem from a different angle. Can
you run something on the file server itself, and use RPC to access it?

The reason I mention this is a lot of UNIXes have an API to detect file
changes live; for instance, google "python fam". It would be easy to hook
something up to scan the files at startup and maintain your totals live,
and then use one of the many extremely easy Python RPC mechanisms to
request the data as the user wants it, which would most likely come back
at network speeds (fast).

This would be orders of magnitude faster, and no scanning system could
compete with it.
-- 
http://mail.python.org/mailman/listinfo/python-list


How To Do It Faster?!?

2005-04-01 Thread andrea_gavana
Hello Jeremy & NG,

>* Poke around in the Windows API for a function that does what you want,
>and hope it can do it faster due to being in the kernel.

I could try it, but I think I have to explain a little bit more my problem.

>If you post more information about how you are using this data, I can try
to help you.

Basically, I have to scan a really BIG directory: essentially, is a UNIX
file system where all our projects resides, with thousand and thousand of
files and more than 1 TB of information. However, we are about 200-300 users
of this space. This is what I do now and I would like to improve:

1) For a particular user (1 and only 1 at a time), I would like to scan
all directories and subdirectories in order to find which FILES are owned
by this user (I am NOT interested in directory owner, only files). Noting
that I am searching only for 1 user, its disc quota is around 20-30 GB,
or something like this;
2) My application is a GUI designed with wxPython. It run on Windows, at
the moment (this is why I am asking for Windows user IDs and similar, on
Unix is much simpler);
3) While scanning the directories (using os.walk), I process the results
of my command "dir /q /-c /a-d MyDirectory" and I display this results on
a wxListCtrl (a list viewer) of wxPython in my GUI;
4) I would not use the suggested command "dir /S" on a DOS shell because,
even if it scans recursively all directories, I am NOT able to process 
intermediate
results because this command never returns until it has finished to scan
ALL directories (and for 1 TB of files, it can take a LOT of time);
5) For all the files in each directory scanned, I do:
- IF a file belongs to that particular user THEN:
  Get the file name;
  Get the file size;
  Get the last modification date;
  Display the result on my wxListCtrl
- ELSE:
  Disregard the information;
- END

I get the file owner using the /Q switch of the DIR command, and I exclude
a priori the subdirectories using the /a-d switch. That because I am using
os.walk().
6) All of our users can see this big unix directory on their PC, labeled
as E:\ or F:\ or whatever. I can not anyway use UNIX command on dos (and
I can not use rsh to communicate with the unix machine and then use something
like "find . -name etc".

I hope to have been clearer this time...

I really welcome all your suggestions.

Andrea.

--
http://mail.python.org/mailman/listinfo/python-list


Re: How To Do It Faster?!?

2005-04-01 Thread Jeremy Bowers
On Thu, 31 Mar 2005 13:38:34 +0200, andrea.gavana wrote:

> Hello NG,
> 
>   in my application, I use os.walk() to walk on a BIG directory. I
>   need
> to retrieve the files, in each sub-directory, that are owned by a
> particular user. Noting that I am on Windows (2000 or XP), this is what I
> do:

You should *try* directly retrieving the relevant information from the OS,
instead of spawning a "dir" process. I have no idea how to do that and it
will probably require the win32 extensions for Python.

After that, you're done. Odds are you'll be disk bound. In fact, you may
get no gain if Windows is optimized enough that the process you describe
below is *still* disk-bound.

Your only hope then is two things:

* Poke around in the Windows API for a function that does what you want,
and hope it can do it faster due to being in the kernel.

* Somehow work this out to be lazy so it tries to grab what the user is
looking at, instead of absolutely everything. Whether or not this will
work depends on your application. If you post more information about how
you are using this data, I can try to help you. (I've had some experience
in this domain, but what is good heavily depends on what you are doing.
For instance, if you're batch processing a whole bunch of records after
the user gave a bulk command, there's not much you can do. But if they're
looking at something in a Windows Explorer-like tree view, there's a lot
you can do to improve responsiveness, even if you can't speed up the
process overall.)
-- 
http://mail.python.org/mailman/listinfo/python-list


How To Do It Faster?!?

2005-03-31 Thread andrea_gavana
Hello max & NG,

>I don't quite understand what your program is doing. The user=a[18::20]
>looks really fragile/specific to a directory to me.

I corrected it to user=a[18::5][:-2], it was my mistake. However, that command
is NOT specific to a particular directory. You can try to whatever directory
or net resource mounted on your system. It works.

>>> a=os.popen("dir /s /q /-c /a-d " + root).read().splitlines()

Mhm... have you tried this command on a BIG directory? On your C: drive
for example? I had to kill Python after having issued that command because
it ate up all my CPU (1GB) for a quite long time. There simply are too many
files/information to retrieve in a single command.
In my first mail, I said I have to work with a BIG directory (more than
1 TB) and I need to retrieve information when they become available (I put
this info on a wxPython ListCtrl). This is why I have chosen os.walk() and
that command (that runs on a separate thread wrt the ListCtrl).
It does NOT run faster than your command (probably my solution is slower),
but I can get information on every directory I scan, while with your command
I have to wait a long time to process the results, plus the user can not
interact with the results already found.

>To get a list containing files owned by a specific user, do something like:
>>> files=[line.split()[-1] for line in a if owner in line]

I will try this solution also.

Thanks NG for your useful suggestions.

Andrea.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Rif: Re: How To Do It Faster?!?

2005-03-31 Thread Peter Hansen
[EMAIL PROTECTED] wrote:
Unfortunately, on Windows it does not seem to work very well:
st = os.stat('MyFile.txt')
print st.st_uid
0
I don't think my user ID is 0...
While with the OS dos command I get:
userid: \\ENI\ag12905
I would recommend using the pywin32 support that almost
certainly exists for getting the owner of a file.
On the other hand, I'm not familiar with the Windows
API in question, but either somebody else will point
you to it, or a search on msdn.com would find it for
you.
-Peter
--
http://mail.python.org/mailman/listinfo/python-list


Re: How To Do It Faster?!?

2005-03-31 Thread Max Erickson
I don't quite understand what your program is doing. The user=a[18::20]
looks really fragile/specific to a directory to me. Try something like
this:

>>> a=os.popen("dir /s /q /-c /a-d " + root).read().splitlines()

Should give you the dir output split into lines, for every file below
root(notice that I added '/s' to the dir command). There will be some
extra lines in a that aren't about specific files...

>>> a[0]
' Volume in drive C has no label.'

but the files should be there.

>>> len(a)
232

To get a list containing files owned by a specific user, do something
like:
>>> files=[line.split()[-1] for line in a if owner in line]
>>> len(files)
118

This is throwing away directory information, but using os.walk()
instead of the /s switch to dir should work, if you need it...

max

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How To Do It Faster?!?

2005-03-31 Thread Aquila Deus
[EMAIL PROTECTED] wrote:
> Hello NG,
>
>   in my application, I use os.walk() to walk on a BIG directory.
I need
> to retrieve the files, in each sub-directory, that are owned by a
> particular user. Noting that I am on Windows (2000 or XP), this is
what I
> do:
>
> for root, dirs, files in os.walk(MyBIGDirectory):
>
> a = os.popen("dir /q /-c /a-d " + root).read().split()
>
> # Retrieve all files owners
> user = a[18::20]
>
> # Retrieve all the last modification dates & hours
> date = a[15::20]
> hours = a[16::20]
>
> # Retrieve all the filenames
> name = a[19::20]
>
> # Retrieve all the files sizes
> size = a[17::20]
>
> # Loop throu all files owners to see if they belong
> # to that particular owner (a string)
> for util in user:
> if util.find(owner) >= 0:
> DO SOME PROCESSING
>
> Does anyone know if there is a faster way to do this job?

You may use "dir /s", which lists everything recursively.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Rif: Re: How To Do It Faster?!?

2005-03-31 Thread Diez B. Roggisch
[EMAIL PROTECTED] wrote:
> Am I missing something on the stat module? I'm running Python 2.3.4.
> 

Yes, you are missing that this is more unix-like. It seems to work in a
certain degree on windows - but as the user-model between unix and windows
is considerably different, you found a not-so-well working part.

I don't think that your code could be much faster at all - the limits are
not so much within python than windows itself. The only thing I can think
of is to use python-win32 to make the calls that dir with your various
options does itself. That would maybe save you the additional overhead of
creating a string representation (done by dir) and parsing that - but I
doubt the performance gain justifies the means.

-- 
Regards,

Diez B. Roggisch
-- 
http://mail.python.org/mailman/listinfo/python-list


Rif: Re: How To Do It Faster?!?

2005-03-31 Thread andrea . gavana

Hello Lazslo & NG,

>You can use the stat module to get attributes like last modification
>date, uid, gid etc. The documentation of the stat module has a nice
>example. Probably it will be faster because you are running an external
>program (well, "dir" may be resident but still the OS needs to create a
>new shell and interpret the parameters on every invocation).

Unfortunately, on Windows it does not seem to work very well:


>>> st = os.stat('MyFile.txt')
>>> print st.st_uid
0

I don't think my user ID is 0...

While with the OS dos command I get:

userid: \\ENI\ag12905

Am I missing something on the stat module? I'm running Python 2.3.4.

Thanks a lot.

Andrea.



-- 
http://mail.python.org/mailman/listinfo/python-list


How To Do It Faster?!?

2005-03-31 Thread andrea . gavana
Hello NG,

  in my application, I use os.walk() to walk on a BIG directory. I need
to retrieve the files, in each sub-directory, that are owned by a
particular user. Noting that I am on Windows (2000 or XP), this is what I
do:

for root, dirs, files in os.walk(MyBIGDirectory):

a = os.popen("dir /q /-c /a-d " + root).read().split()

# Retrieve all files owners
user = a[18::20]

# Retrieve all the last modification dates & hours
date = a[15::20]
hours = a[16::20]

# Retrieve all the filenames
name = a[19::20]

# Retrieve all the files sizes
size = a[17::20]

# Loop throu all files owners to see if they belong
# to that particular owner (a string)
for util in user:
if util.find(owner) >= 0:
DO SOME PROCESSING

Does anyone know if there is a faster way to do this job?

Thanks to you all.

Andrea.

--
 Message for the recipient only, if received in error, please notify the
sender and read http://www.eni.it/disclaimer/


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How To Do It Faster?!?

2005-03-31 Thread Laszlo Zsolt Nagy
[EMAIL PROTECTED] wrote:
Hello NG,
 in my application, I use os.walk() to walk on a BIG directory. I need
to retrieve the files, in each sub-directory, that are owned by a
particular user. Noting that I am on Windows (2000 or XP), this is what I
do:
for root, dirs, files in os.walk(MyBIGDirectory):
   a = os.popen("dir /q /-c /a-d " + root).read().split()
 


Does anyone know if there is a faster way to do this job?
 

You can use the stat module to get attributes like last modification 
date, uid, gid etc. The documentation of the stat module has a nice 
example. Probably it will be faster because you are running an external 
program (well, "dir" may be resident but still the OS needs to create a 
new shell and interpret the parameters on every invocation).
If the speed is the same, you may still want to use the stat module because:

- it is platform independent
- it is independent of any external program (for example, the DIR 
command can change in the future)

Best,
Laci
--
_
 Laszlo Nagy  web: http://designasign.biz
 IT Consultantmail: [EMAIL PROTECTED]
Python forever!
--
http://mail.python.org/mailman/listinfo/python-list