Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Francesc Alted
Ah, that's good to know.  Yes, I see definitely the warning is helping 
people to flush periodically and helping preventing data corruption.

Thanks for the feedback,
Francesc

On 6/10/13 5:16 PM, Edward Vogel wrote:
> I initially didn't sync at all until after completing writing - about 
> 1 million rows total. My main concern was preventing data corruption. 
> After seeing the warning I had a sync for every iteration of the inner 
> loop, which was slow. Syncing after the inner loop is a little slower 
> than not syncing, but seems fine.
> Thanks,
> Ed
>
>
> On Mon, Jun 10, 2013 at 4:37 PM, Francesc Alted  > wrote:
>
> Hi Ed,
>
> After fixing the issue, does performance has been enhanced?  I'm
> the one
> who put the warning, so I'm curious on whether this actually helps
> people or not.
>
> Thanks,
> Francesc
>
> On 6/10/13 3:28 PM, Edward Vogel wrote:
> > Yes, exactly.
> > I'm pulling data out of C that has a 1 to many relationship, and
> > dumping it into pytables for easier analysis. I'm creating extension
> > classes in cython to get access to the C structures.
> > It looks like this (basically, each cv1 has several cv2s):
> >
> > h5.create_table('/', 'cv1', schema_cv1)
> > h5.create_table('/', 'cv2', schema_cv2)
> > cv1_row = h5.root.cv1.row
> > cv2_row = h5.root.cv2.row
> > for cv in sf.itercv():
> > cv1_row['addr'] = cv['addr']
> > ...
> > cv1_row.append()
> > for cv2 in cv.itercv2():
> > cv2_row['cv1_addr'] = cv['addr']
> > cv2_row['foo'] = cv2_row['foo']
> > ...
> > cv2_row.append()
> > h5.root.cv2.flush()  # This fixes issue
> >
> > Adding the flush after the inner loop does fix the issue. (Thanks!)
> > So, my followup question, why do I need a flush after the inner
> loop,
> > but not when moving from the outer loop to the inner loop?
> >
> > Thanks!
> >
> >
> >
> > On Mon, Jun 10, 2013 at 2:48 PM, Anthony Scopatz
> mailto:scop...@gmail.com>
> > >> wrote:
> >
> > Hi Ed,
> >
> > Are you inside of a nested loop?  You probably just need to
> flush
> > after the innermost loop.
> >
> > Do you have some sample code you can share?
> >
> > Be Well
> > Anthony
> >
> >
> > On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel
> > mailto:edwardvog...@gmail.com>
> >>
> wrote:
> >
> > I have a dataset that I want to split between two
> tables. But,
> > when I iterate over the data and append to both tables,
> I get
> > a warning:
> >
> > /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
> > PerformanceWarning: table ``/cv2`` is being preempted from
> > alive nodes without its buffers being flushed or with some
> > index being dirty.  This may lead to very ineficient use of
> > resources and even to fatal errors in certain situations.
> >  Please do a call to the .flush() or .reindex_dirty()
> methods
> > on this table before start using other nodes.
> >
> > However, if I flush after every append, I get awful
> performance.
> > Is there a correct way to append to two tables without
> doing a
> > flush?
> > Note, I don't have any indices defined, so it seems
> > reindex_dirty() doesn't apply.
> >
> > Thanks,
> > Ed
> >
> >
> 
> --
> > This SF.net email is sponsored by Windows:
> >
> > Build for Windows Store.
> >
> > http://p.sf.net/sfu/windows-dev2dev
> > ___
> > Pytables-users mailing list
> > Pytables-users@lists.sourceforge.net
> 
> >  >
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> 
> --
> > This SF.net email is sponsored by Windows:
> >
> > Build for Windows Store.
> >
> > http://p.sf.net/sfu/windows-dev2dev
> > ___
> > Pytables-users mailing list
> > Pytables-users@lists.sourceforge.net
> 
> >  >
> > https://

Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Edward Vogel
I initially didn't sync at all until after completing writing - about 1
million rows total. My main concern was preventing data corruption. After
seeing the warning I had a sync for every iteration of the inner loop,
which was slow. Syncing after the inner loop is a little slower than not
syncing, but seems fine.
Thanks,
Ed


On Mon, Jun 10, 2013 at 4:37 PM, Francesc Alted  wrote:

> Hi Ed,
>
> After fixing the issue, does performance has been enhanced?  I'm the one
> who put the warning, so I'm curious on whether this actually helps
> people or not.
>
> Thanks,
> Francesc
>
> On 6/10/13 3:28 PM, Edward Vogel wrote:
> > Yes, exactly.
> > I'm pulling data out of C that has a 1 to many relationship, and
> > dumping it into pytables for easier analysis. I'm creating extension
> > classes in cython to get access to the C structures.
> > It looks like this (basically, each cv1 has several cv2s):
> >
> > h5.create_table('/', 'cv1', schema_cv1)
> > h5.create_table('/', 'cv2', schema_cv2)
> > cv1_row = h5.root.cv1.row
> > cv2_row = h5.root.cv2.row
> > for cv in sf.itercv():
> > cv1_row['addr'] = cv['addr']
> > ...
> > cv1_row.append()
> > for cv2 in cv.itercv2():
> > cv2_row['cv1_addr'] = cv['addr']
> > cv2_row['foo'] = cv2_row['foo']
> > ...
> > cv2_row.append()
> > h5.root.cv2.flush()  # This fixes issue
> >
> > Adding the flush after the inner loop does fix the issue. (Thanks!)
> > So, my followup question, why do I need a flush after the inner loop,
> > but not when moving from the outer loop to the inner loop?
> >
> > Thanks!
> >
> >
> >
> > On Mon, Jun 10, 2013 at 2:48 PM, Anthony Scopatz  > > wrote:
> >
> > Hi Ed,
> >
> > Are you inside of a nested loop?  You probably just need to flush
> > after the innermost loop.
> >
> > Do you have some sample code you can share?
> >
> > Be Well
> > Anthony
> >
> >
> > On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel
> > mailto:edwardvog...@gmail.com>> wrote:
> >
> > I have a dataset that I want to split between two tables. But,
> > when I iterate over the data and append to both tables, I get
> > a warning:
> >
> > /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
> > PerformanceWarning: table ``/cv2`` is being preempted from
> > alive nodes without its buffers being flushed or with some
> > index being dirty.  This may lead to very ineficient use of
> > resources and even to fatal errors in certain situations.
> >  Please do a call to the .flush() or .reindex_dirty() methods
> > on this table before start using other nodes.
> >
> > However, if I flush after every append, I get awful performance.
> > Is there a correct way to append to two tables without doing a
> > flush?
> > Note, I don't have any indices defined, so it seems
> > reindex_dirty() doesn't apply.
> >
> > Thanks,
> > Ed
> >
> >
> --
> > This SF.net email is sponsored by Windows:
> >
> > Build for Windows Store.
> >
> > http://p.sf.net/sfu/windows-dev2dev
> > ___
> > Pytables-users mailing list
> > Pytables-users@lists.sourceforge.net
> > 
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> --
> > This SF.net email is sponsored by Windows:
> >
> > Build for Windows Store.
> >
> > http://p.sf.net/sfu/windows-dev2dev
> > ___
> > Pytables-users mailing list
> > Pytables-users@lists.sourceforge.net
> > 
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> >
> --
> > This SF.net email is sponsored by Windows:
> >
> > Build for Windows Store.
> >
> > http://p.sf.net/sfu/windows-dev2dev
> >
> >
> > ___
> > Pytables-users mailing list
> > Pytables-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> --
> Francesc Alted
>
>
>
> --
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
--
This SF.net email is sponsored by Windows:

Build for 

Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Francesc Alted
Hi Ed,

After fixing the issue, does performance has been enhanced?  I'm the one 
who put the warning, so I'm curious on whether this actually helps 
people or not.

Thanks,
Francesc

On 6/10/13 3:28 PM, Edward Vogel wrote:
> Yes, exactly.
> I'm pulling data out of C that has a 1 to many relationship, and 
> dumping it into pytables for easier analysis. I'm creating extension 
> classes in cython to get access to the C structures.
> It looks like this (basically, each cv1 has several cv2s):
>
> h5.create_table('/', 'cv1', schema_cv1)
> h5.create_table('/', 'cv2', schema_cv2)
> cv1_row = h5.root.cv1.row
> cv2_row = h5.root.cv2.row
> for cv in sf.itercv():
> cv1_row['addr'] = cv['addr']
> ...
> cv1_row.append()
> for cv2 in cv.itercv2():
> cv2_row['cv1_addr'] = cv['addr']
> cv2_row['foo'] = cv2_row['foo']
> ...
> cv2_row.append()
> h5.root.cv2.flush()  # This fixes issue
>
> Adding the flush after the inner loop does fix the issue. (Thanks!)
> So, my followup question, why do I need a flush after the inner loop, 
> but not when moving from the outer loop to the inner loop?
>
> Thanks!
>
>
>
> On Mon, Jun 10, 2013 at 2:48 PM, Anthony Scopatz  > wrote:
>
> Hi Ed,
>
> Are you inside of a nested loop?  You probably just need to flush
> after the innermost loop.
>
> Do you have some sample code you can share?
>
> Be Well
> Anthony
>
>
> On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel
> mailto:edwardvog...@gmail.com>> wrote:
>
> I have a dataset that I want to split between two tables. But,
> when I iterate over the data and append to both tables, I get
> a warning:
>
> /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
> PerformanceWarning: table ``/cv2`` is being preempted from
> alive nodes without its buffers being flushed or with some
> index being dirty.  This may lead to very ineficient use of
> resources and even to fatal errors in certain situations.
>  Please do a call to the .flush() or .reindex_dirty() methods
> on this table before start using other nodes.
>
> However, if I flush after every append, I get awful performance.
> Is there a correct way to append to two tables without doing a
> flush?
> Note, I don't have any indices defined, so it seems
> reindex_dirty() doesn't apply.
>
> Thanks,
> Ed
>
> 
> --
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> 
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> 
> --
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> 
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> --
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
>
>
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users


-- 
Francesc Alted


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Anthony Scopatz
On Mon, Jun 10, 2013 at 2:28 PM, Edward Vogel wrote:

> Yes, exactly.
> I'm pulling data out of C that has a 1 to many relationship, and dumping
> it into pytables for easier analysis. I'm creating extension classes in
> cython to get access to the C structures.
> It looks like this (basically, each cv1 has several cv2s):
>
> h5.create_table('/', 'cv1', schema_cv1)
> h5.create_table('/', 'cv2', schema_cv2)
> cv1_row = h5.root.cv1.row
> cv2_row = h5.root.cv2.row
> for cv in sf.itercv():
> cv1_row['addr'] = cv['addr']
> ...
> cv1_row.append()
> for cv2 in cv.itercv2():
> cv2_row['cv1_addr'] = cv['addr']
> cv2_row['foo'] = cv2_row['foo']
> ...
> cv2_row.append()
> h5.root.cv2.flush()  # This fixes issue
>
> Adding the flush after the inner loop does fix the issue. (Thanks!)
>

No problem!  I am glad this worked.


> So, my followup question, why do I need a flush after the inner loop, but
> not when moving from the outer loop to the inner loop?
>

It has to do with when the write buffer gets created / filled / flushed.
 These steps need to happen at the proper time or you can mess loose the
data you were writing, overflow memory, etc.

Be Well
Anthony


>
> Thanks!
>
>
>
> On Mon, Jun 10, 2013 at 2:48 PM, Anthony Scopatz wrote:
>
>> Hi Ed,
>>
>> Are you inside of a nested loop?  You probably just need to flush after
>> the innermost loop.
>>
>> Do you have some sample code you can share?
>>
>> Be Well
>> Anthony
>>
>>
>> On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel wrote:
>>
>>> I have a dataset that I want to split between two tables. But, when I
>>> iterate over the data and append to both tables, I get a warning:
>>>
>>> /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
>>> PerformanceWarning: table ``/cv2`` is being preempted from alive nodes
>>> without its buffers being flushed or with some index being dirty.  This may
>>> lead to very ineficient use of resources and even to fatal errors in
>>> certain situations.  Please do a call to the .flush() or .reindex_dirty()
>>> methods on this table before start using other nodes.
>>>
>>> However, if I flush after every append, I get awful performance.
>>> Is there a correct way to append to two tables without doing a flush?
>>> Note, I don't have any indices defined, so it seems reindex_dirty()
>>> doesn't apply.
>>>
>>> Thanks,
>>> Ed
>>>
>>>
>>> --
>>> This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> ___
>>> Pytables-users mailing list
>>> Pytables-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> --
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>>
>> http://p.sf.net/sfu/windows-dev2dev
>> ___
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Edward Vogel
Yes, exactly.
I'm pulling data out of C that has a 1 to many relationship, and dumping it
into pytables for easier analysis. I'm creating extension classes in cython
to get access to the C structures.
It looks like this (basically, each cv1 has several cv2s):

h5.create_table('/', 'cv1', schema_cv1)
h5.create_table('/', 'cv2', schema_cv2)
cv1_row = h5.root.cv1.row
cv2_row = h5.root.cv2.row
for cv in sf.itercv():
cv1_row['addr'] = cv['addr']
...
cv1_row.append()
for cv2 in cv.itercv2():
cv2_row['cv1_addr'] = cv['addr']
cv2_row['foo'] = cv2_row['foo']
...
cv2_row.append()
h5.root.cv2.flush()  # This fixes issue

Adding the flush after the inner loop does fix the issue. (Thanks!)
So, my followup question, why do I need a flush after the inner loop, but
not when moving from the outer loop to the inner loop?

Thanks!



On Mon, Jun 10, 2013 at 2:48 PM, Anthony Scopatz  wrote:

> Hi Ed,
>
> Are you inside of a nested loop?  You probably just need to flush after
> the innermost loop.
>
> Do you have some sample code you can share?
>
> Be Well
> Anthony
>
>
> On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel wrote:
>
>> I have a dataset that I want to split between two tables. But, when I
>> iterate over the data and append to both tables, I get a warning:
>>
>> /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
>> PerformanceWarning: table ``/cv2`` is being preempted from alive nodes
>> without its buffers being flushed or with some index being dirty.  This may
>> lead to very ineficient use of resources and even to fatal errors in
>> certain situations.  Please do a call to the .flush() or .reindex_dirty()
>> methods on this table before start using other nodes.
>>
>> However, if I flush after every append, I get awful performance.
>> Is there a correct way to append to two tables without doing a flush?
>> Note, I don't have any indices defined, so it seems reindex_dirty()
>> doesn't apply.
>>
>> Thanks,
>> Ed
>>
>>
>> --
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>>
>> http://p.sf.net/sfu/windows-dev2dev
>> ___
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Anthony Scopatz
Hi Ed,

Are you inside of a nested loop?  You probably just need to flush after the
innermost loop.

Do you have some sample code you can share?

Be Well
Anthony


On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel wrote:

> I have a dataset that I want to split between two tables. But, when I
> iterate over the data and append to both tables, I get a warning:
>
> /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
> PerformanceWarning: table ``/cv2`` is being preempted from alive nodes
> without its buffers being flushed or with some index being dirty.  This may
> lead to very ineficient use of resources and even to fatal errors in
> certain situations.  Please do a call to the .flush() or .reindex_dirty()
> methods on this table before start using other nodes.
>
> However, if I flush after every append, I get awful performance.
> Is there a correct way to append to two tables without doing a flush?
> Note, I don't have any indices defined, so it seems reindex_dirty()
> doesn't apply.
>
> Thanks,
> Ed
>
>
> --
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users