Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Anthony Scopatz
Hi Ed,

Are you inside of a nested loop?  You probably just need to flush after the
innermost loop.

Do you have some sample code you can share?

Be Well
Anthony


On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel edwardvog...@gmail.comwrote:

 I have a dataset that I want to split between two tables. But, when I
 iterate over the data and append to both tables, I get a warning:

 /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
 PerformanceWarning: table ``/cv2`` is being preempted from alive nodes
 without its buffers being flushed or with some index being dirty.  This may
 lead to very ineficient use of resources and even to fatal errors in
 certain situations.  Please do a call to the .flush() or .reindex_dirty()
 methods on this table before start using other nodes.

 However, if I flush after every append, I get awful performance.
 Is there a correct way to append to two tables without doing a flush?
 Note, I don't have any indices defined, so it seems reindex_dirty()
 doesn't apply.

 Thanks,
 Ed


 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Francesc Alted
Hi Ed,

After fixing the issue, does performance has been enhanced?  I'm the one 
who put the warning, so I'm curious on whether this actually helps 
people or not.

Thanks,
Francesc

On 6/10/13 3:28 PM, Edward Vogel wrote:
 Yes, exactly.
 I'm pulling data out of C that has a 1 to many relationship, and 
 dumping it into pytables for easier analysis. I'm creating extension 
 classes in cython to get access to the C structures.
 It looks like this (basically, each cv1 has several cv2s):

 h5.create_table('/', 'cv1', schema_cv1)
 h5.create_table('/', 'cv2', schema_cv2)
 cv1_row = h5.root.cv1.row
 cv2_row = h5.root.cv2.row
 for cv in sf.itercv():
 cv1_row['addr'] = cv['addr']
 ...
 cv1_row.append()
 for cv2 in cv.itercv2():
 cv2_row['cv1_addr'] = cv['addr']
 cv2_row['foo'] = cv2_row['foo']
 ...
 cv2_row.append()
 h5.root.cv2.flush()  # This fixes issue

 Adding the flush after the inner loop does fix the issue. (Thanks!)
 So, my followup question, why do I need a flush after the inner loop, 
 but not when moving from the outer loop to the inner loop?

 Thanks!



 On Mon, Jun 10, 2013 at 2:48 PM, Anthony Scopatz scop...@gmail.com 
 mailto:scop...@gmail.com wrote:

 Hi Ed,

 Are you inside of a nested loop?  You probably just need to flush
 after the innermost loop.

 Do you have some sample code you can share?

 Be Well
 Anthony


 On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel
 edwardvog...@gmail.com mailto:edwardvog...@gmail.com wrote:

 I have a dataset that I want to split between two tables. But,
 when I iterate over the data and append to both tables, I get
 a warning:

 /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
 PerformanceWarning: table ``/cv2`` is being preempted from
 alive nodes without its buffers being flushed or with some
 index being dirty.  This may lead to very ineficient use of
 resources and even to fatal errors in certain situations.
  Please do a call to the .flush() or .reindex_dirty() methods
 on this table before start using other nodes.

 However, if I flush after every append, I get awful performance.
 Is there a correct way to append to two tables without doing a
 flush?
 Note, I don't have any indices defined, so it seems
 reindex_dirty() doesn't apply.

 Thanks,
 Ed

 
 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 mailto:Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users



 
 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 mailto:Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev


 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


-- 
Francesc Alted


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Edward Vogel
I initially didn't sync at all until after completing writing - about 1
million rows total. My main concern was preventing data corruption. After
seeing the warning I had a sync for every iteration of the inner loop,
which was slow. Syncing after the inner loop is a little slower than not
syncing, but seems fine.
Thanks,
Ed


On Mon, Jun 10, 2013 at 4:37 PM, Francesc Alted fal...@gmail.com wrote:

 Hi Ed,

 After fixing the issue, does performance has been enhanced?  I'm the one
 who put the warning, so I'm curious on whether this actually helps
 people or not.

 Thanks,
 Francesc

 On 6/10/13 3:28 PM, Edward Vogel wrote:
  Yes, exactly.
  I'm pulling data out of C that has a 1 to many relationship, and
  dumping it into pytables for easier analysis. I'm creating extension
  classes in cython to get access to the C structures.
  It looks like this (basically, each cv1 has several cv2s):
 
  h5.create_table('/', 'cv1', schema_cv1)
  h5.create_table('/', 'cv2', schema_cv2)
  cv1_row = h5.root.cv1.row
  cv2_row = h5.root.cv2.row
  for cv in sf.itercv():
  cv1_row['addr'] = cv['addr']
  ...
  cv1_row.append()
  for cv2 in cv.itercv2():
  cv2_row['cv1_addr'] = cv['addr']
  cv2_row['foo'] = cv2_row['foo']
  ...
  cv2_row.append()
  h5.root.cv2.flush()  # This fixes issue
 
  Adding the flush after the inner loop does fix the issue. (Thanks!)
  So, my followup question, why do I need a flush after the inner loop,
  but not when moving from the outer loop to the inner loop?
 
  Thanks!
 
 
 
  On Mon, Jun 10, 2013 at 2:48 PM, Anthony Scopatz scop...@gmail.com
  mailto:scop...@gmail.com wrote:
 
  Hi Ed,
 
  Are you inside of a nested loop?  You probably just need to flush
  after the innermost loop.
 
  Do you have some sample code you can share?
 
  Be Well
  Anthony
 
 
  On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel
  edwardvog...@gmail.com mailto:edwardvog...@gmail.com wrote:
 
  I have a dataset that I want to split between two tables. But,
  when I iterate over the data and append to both tables, I get
  a warning:
 
  /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
  PerformanceWarning: table ``/cv2`` is being preempted from
  alive nodes without its buffers being flushed or with some
  index being dirty.  This may lead to very ineficient use of
  resources and even to fatal errors in certain situations.
   Please do a call to the .flush() or .reindex_dirty() methods
  on this table before start using other nodes.
 
  However, if I flush after every append, I get awful performance.
  Is there a correct way to append to two tables without doing a
  flush?
  Note, I don't have any indices defined, so it seems
  reindex_dirty() doesn't apply.
 
  Thanks,
  Ed
 
 
 --
  This SF.net email is sponsored by Windows:
 
  Build for Windows Store.
 
  http://p.sf.net/sfu/windows-dev2dev
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  mailto:Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 
 
 
 --
  This SF.net email is sponsored by Windows:
 
  Build for Windows Store.
 
  http://p.sf.net/sfu/windows-dev2dev
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  mailto:Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 
 
 
 
 --
  This SF.net email is sponsored by Windows:
 
  Build for Windows Store.
 
  http://p.sf.net/sfu/windows-dev2dev
 
 
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users


 --
 Francesc Alted



 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net

Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Francesc Alted
Ah, that's good to know.  Yes, I see definitely the warning is helping 
people to flush periodically and helping preventing data corruption.

Thanks for the feedback,
Francesc

On 6/10/13 5:16 PM, Edward Vogel wrote:
 I initially didn't sync at all until after completing writing - about 
 1 million rows total. My main concern was preventing data corruption. 
 After seeing the warning I had a sync for every iteration of the inner 
 loop, which was slow. Syncing after the inner loop is a little slower 
 than not syncing, but seems fine.
 Thanks,
 Ed


 On Mon, Jun 10, 2013 at 4:37 PM, Francesc Alted fal...@gmail.com 
 mailto:fal...@gmail.com wrote:

 Hi Ed,

 After fixing the issue, does performance has been enhanced?  I'm
 the one
 who put the warning, so I'm curious on whether this actually helps
 people or not.

 Thanks,
 Francesc

 On 6/10/13 3:28 PM, Edward Vogel wrote:
  Yes, exactly.
  I'm pulling data out of C that has a 1 to many relationship, and
  dumping it into pytables for easier analysis. I'm creating extension
  classes in cython to get access to the C structures.
  It looks like this (basically, each cv1 has several cv2s):
 
  h5.create_table('/', 'cv1', schema_cv1)
  h5.create_table('/', 'cv2', schema_cv2)
  cv1_row = h5.root.cv1.row
  cv2_row = h5.root.cv2.row
  for cv in sf.itercv():
  cv1_row['addr'] = cv['addr']
  ...
  cv1_row.append()
  for cv2 in cv.itercv2():
  cv2_row['cv1_addr'] = cv['addr']
  cv2_row['foo'] = cv2_row['foo']
  ...
  cv2_row.append()
  h5.root.cv2.flush()  # This fixes issue
 
  Adding the flush after the inner loop does fix the issue. (Thanks!)
  So, my followup question, why do I need a flush after the inner
 loop,
  but not when moving from the outer loop to the inner loop?
 
  Thanks!
 
 
 
  On Mon, Jun 10, 2013 at 2:48 PM, Anthony Scopatz
 scop...@gmail.com mailto:scop...@gmail.com
  mailto:scop...@gmail.com mailto:scop...@gmail.com wrote:
 
  Hi Ed,
 
  Are you inside of a nested loop?  You probably just need to
 flush
  after the innermost loop.
 
  Do you have some sample code you can share?
 
  Be Well
  Anthony
 
 
  On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel
  edwardvog...@gmail.com mailto:edwardvog...@gmail.com
 mailto:edwardvog...@gmail.com mailto:edwardvog...@gmail.com
 wrote:
 
  I have a dataset that I want to split between two
 tables. But,
  when I iterate over the data and append to both tables,
 I get
  a warning:
 
  /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
  PerformanceWarning: table ``/cv2`` is being preempted from
  alive nodes without its buffers being flushed or with some
  index being dirty.  This may lead to very ineficient use of
  resources and even to fatal errors in certain situations.
   Please do a call to the .flush() or .reindex_dirty()
 methods
  on this table before start using other nodes.
 
  However, if I flush after every append, I get awful
 performance.
  Is there a correct way to append to two tables without
 doing a
  flush?
  Note, I don't have any indices defined, so it seems
  reindex_dirty() doesn't apply.
 
  Thanks,
  Ed
 
 
 
 --
  This SF.net email is sponsored by Windows:
 
  Build for Windows Store.
 
  http://p.sf.net/sfu/windows-dev2dev
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
 mailto:Pytables-users@lists.sourceforge.net
  mailto:Pytables-users@lists.sourceforge.net
 mailto:Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 
 
 
 
 --
  This SF.net email is sponsored by Windows:
 
  Build for Windows Store.
 
  http://p.sf.net/sfu/windows-dev2dev
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
 mailto:Pytables-users@lists.sourceforge.net
  mailto:Pytables-users@lists.sourceforge.net
 mailto:Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 
 
 
 
 
 --
  This SF.net email