Re: [Pytables-users] Row.append()

Anthony Scopatz Fri, 03 May 2013 12:57:15 -0700

On Fri, May 3, 2013 at 2:52 PM, Jim Knoll <[email protected]>wrote:


> Speed is the problem.  I am looking for the fastest possible way to do
> this.  I was thinking of using Pandas and was able to achieve fair
> performance using that lib.   It just seemed like I was using panada as a
> middle man it introduces some issues with the data types.  Could it be
> faster to pull it into a numpy array in chunks and write it out?
>

I think that where_append() is going to be the fastest then.  The other
option is to pull the table out in chunks into numpy arrays and then write
it back out.  This is almost certainly slower because you will be iterating
in python (not C) and you will be not be multi-threaded.  You get this
through "where_append(dst, "True")" because it is using numexpr under the
hood.

Be Well
Anthony


>  ****
>
> ** **
>
> *From:* Anthony Scopatz [mailto:[email protected]]
> *Sent:* Friday, May 03, 2013 2:14 PM
> *To:* Discussion list for PyTables
> *Subject:* Re: [Pytables-users] Row.append()****
>
> ** **
>
> On Fri, May 3, 2013 at 1:15 PM, Jim Knoll <[email protected]>
> wrote:****
>
> I am trying to make this better / faster…  ****
>
> Data comes faster than I can store it on one box.  So My though was to
> have many boxes each storing their own part in their own table.****
>
> Later I would concatenate the tables together with something like this:***
> *
>
>  ****
>
> dest_h5f = pt.openFile(path + 'big_mater.h5','a')****
>
> for source_path in source_h5_path_list:****
>
>     h5f = pt.openFile(source_path,'r')****
>
>     for node in h5f.root:****
>
>         dest_table = dest_h5f.getNode('/', name = node.name)****
>
>         print node.nrows****
>
>         if node.nrows > 0 and node.nrows < 1000000:   # found I needed to
> limit the max size or I would crash  ****
>
>             dest_table.append(node.read())****
>
>             dest_table.flush()****
>
>     h5f.close()****
>
> dest_h5f.close()****
>
>  ****
>
> I could add the logic to iter in chunks over the source data to overcome
> the crash and  but I suspect there could be a better way.  ****
>
> ** **
>
> Hi Jim, ****
>
> ** **
>
> You can just iterate over each row in the table (ie "for row in node").
>  This is slow, but would solve the problem.  ****
>
>  ****
>
>  Take a table in one h5 file and append it to a table in another h5
> file.   Looked like Table.copy() would do the trick but don’t see how I get
> it to append to an existing table.****
>
> ** **
>
> You could append directly by using the where_append() method with the
> condition "'True'" to append the whole table.  This will automatically do
> the chunking for you.****
>
> ** **
>
> Be Well****
>
> Anthony****
>
>  ****
>
>  ****
>
> My h5 files have 4 rec arrays all stored in root.****
>
>  ****
>
> Any suggestions?****
>
> ** **
> ------------------------------
>
> *    Jim Knoll*     *
>      DBA/Developer II*
>
>      Spot Trading L.L.C
>      440 South LaSalle St., Suite 2800
>      Chicago, IL 60605
>      Office: 312.362.4550
>      Direct: 312-362-4798
>      Fax: 312.362.4551
>      [email protected]
>      www.spottradingllc.com ****
> ------------------------------
>
> The information contained in this message may be privileged and
> confidential and protected from disclosure. If the reader of this message
> is not the intended recipient, or an employee or agent responsible for
> delivering this message to the intended recipient, you are hereby notified
> that any dissemination, distribution or copying of this communication is
> strictly prohibited. If you have received this communication in error,
> please notify us immediately by replying to the message and deleting it
> from your computer. Thank you. Spot Trading, LLC****
>
>                          ****
>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Pytables-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pytables-users****
>
> ** **
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Pytables-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2

_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Row.append()

Reply via email to