[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-07 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902495#comment-16902495
 ] 

Udi Meiri commented on BEAM-7860:
-

merged

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-07 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902396#comment-16902396
 ] 

yifan zou commented on BEAM-7860:
-

Great, thanks!

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-07 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902366#comment-16902366
 ] 

Udi Meiri commented on BEAM-7860:
-

Hopefully PR will be merged today.

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-06 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901244#comment-16901244
 ] 

yifan zou commented on BEAM-7860:
-

Any ETA on this?

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-02 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899273#comment-16899273
 ] 

Udi Meiri commented on BEAM-7860:
-

I should be possible to work around this by setting num_splits=1 in 
ReadFromDatastore. You'll get a SplitNotPossibleError and performance will 
suffer (depending on the size of the result), but it'll be correct.

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Major
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-02 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899271#comment-16899271
 ] 

Udi Meiri commented on BEAM-7860:
-

Verified that this happens in both v1 and v1new.

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Major
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)