[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types
[ https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902495#comment-16902495 ] Udi Meiri commented on BEAM-7860: - merged > v1new ReadFromDatastore returns duplicates if keys are of mixed types > - > > Key: BEAM-7860 > URL: https://issues.apache.org/jira/browse/BEAM-7860 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Affects Versions: 2.13.0 > Environment: Python 2.7 > Python 3.7 >Reporter: Niels Stender >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.15.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In the presence of mixed type keys, v1new ReadFromDatastore may return > duplicate items. The attached example returns 4 records, not the expected 3. > > {code:java} > // code placeholder > from __future__ import unicode_literals > import apache_beam as beam > from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query > from apache_beam.io.gcp.datastore.v1new import datastoreio > config = dict(project='your-google-project', namespace='test') > def test_mixed(): > keys = [ > Key(['mixed', '10038260-iperm_eservice'], **config), > Key(['mixed', 4812224868188160], **config), > Key(['mixed', '99152975-pointshop'], **config) > ] > entities = map(lambda key: Entity(key=key), keys) > with beam.Pipeline() as p: > (p > | beam.Create(entities) > | datastoreio.WriteToDatastore(project=config['project']) > ) > query = Query(kind='mixed', **config) > with beam.Pipeline() as p: > (p > | datastoreio.ReadFromDatastore(query=query, num_splits=4) > | beam.io.WriteToText('tmp.txt', num_shards=1, > shard_name_template='') > ) > items = open('tmp.txt').read().strip().split('\n') > assert len(items) == 3, 'incorrect number of items' > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types
[ https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902396#comment-16902396 ] yifan zou commented on BEAM-7860: - Great, thanks! > v1new ReadFromDatastore returns duplicates if keys are of mixed types > - > > Key: BEAM-7860 > URL: https://issues.apache.org/jira/browse/BEAM-7860 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Affects Versions: 2.13.0 > Environment: Python 2.7 > Python 3.7 >Reporter: Niels Stender >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.15.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > In the presence of mixed type keys, v1new ReadFromDatastore may return > duplicate items. The attached example returns 4 records, not the expected 3. > > {code:java} > // code placeholder > from __future__ import unicode_literals > import apache_beam as beam > from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query > from apache_beam.io.gcp.datastore.v1new import datastoreio > config = dict(project='your-google-project', namespace='test') > def test_mixed(): > keys = [ > Key(['mixed', '10038260-iperm_eservice'], **config), > Key(['mixed', 4812224868188160], **config), > Key(['mixed', '99152975-pointshop'], **config) > ] > entities = map(lambda key: Entity(key=key), keys) > with beam.Pipeline() as p: > (p > | beam.Create(entities) > | datastoreio.WriteToDatastore(project=config['project']) > ) > query = Query(kind='mixed', **config) > with beam.Pipeline() as p: > (p > | datastoreio.ReadFromDatastore(query=query, num_splits=4) > | beam.io.WriteToText('tmp.txt', num_shards=1, > shard_name_template='') > ) > items = open('tmp.txt').read().strip().split('\n') > assert len(items) == 3, 'incorrect number of items' > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types
[ https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902366#comment-16902366 ] Udi Meiri commented on BEAM-7860: - Hopefully PR will be merged today. > v1new ReadFromDatastore returns duplicates if keys are of mixed types > - > > Key: BEAM-7860 > URL: https://issues.apache.org/jira/browse/BEAM-7860 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Affects Versions: 2.13.0 > Environment: Python 2.7 > Python 3.7 >Reporter: Niels Stender >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.15.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > In the presence of mixed type keys, v1new ReadFromDatastore may return > duplicate items. The attached example returns 4 records, not the expected 3. > > {code:java} > // code placeholder > from __future__ import unicode_literals > import apache_beam as beam > from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query > from apache_beam.io.gcp.datastore.v1new import datastoreio > config = dict(project='your-google-project', namespace='test') > def test_mixed(): > keys = [ > Key(['mixed', '10038260-iperm_eservice'], **config), > Key(['mixed', 4812224868188160], **config), > Key(['mixed', '99152975-pointshop'], **config) > ] > entities = map(lambda key: Entity(key=key), keys) > with beam.Pipeline() as p: > (p > | beam.Create(entities) > | datastoreio.WriteToDatastore(project=config['project']) > ) > query = Query(kind='mixed', **config) > with beam.Pipeline() as p: > (p > | datastoreio.ReadFromDatastore(query=query, num_splits=4) > | beam.io.WriteToText('tmp.txt', num_shards=1, > shard_name_template='') > ) > items = open('tmp.txt').read().strip().split('\n') > assert len(items) == 3, 'incorrect number of items' > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types
[ https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901244#comment-16901244 ] yifan zou commented on BEAM-7860: - Any ETA on this? > v1new ReadFromDatastore returns duplicates if keys are of mixed types > - > > Key: BEAM-7860 > URL: https://issues.apache.org/jira/browse/BEAM-7860 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Affects Versions: 2.13.0 > Environment: Python 2.7 > Python 3.7 >Reporter: Niels Stender >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.15.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In the presence of mixed type keys, v1new ReadFromDatastore may return > duplicate items. The attached example returns 4 records, not the expected 3. > > {code:java} > // code placeholder > from __future__ import unicode_literals > import apache_beam as beam > from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query > from apache_beam.io.gcp.datastore.v1new import datastoreio > config = dict(project='your-google-project', namespace='test') > def test_mixed(): > keys = [ > Key(['mixed', '10038260-iperm_eservice'], **config), > Key(['mixed', 4812224868188160], **config), > Key(['mixed', '99152975-pointshop'], **config) > ] > entities = map(lambda key: Entity(key=key), keys) > with beam.Pipeline() as p: > (p > | beam.Create(entities) > | datastoreio.WriteToDatastore(project=config['project']) > ) > query = Query(kind='mixed', **config) > with beam.Pipeline() as p: > (p > | datastoreio.ReadFromDatastore(query=query, num_splits=4) > | beam.io.WriteToText('tmp.txt', num_shards=1, > shard_name_template='') > ) > items = open('tmp.txt').read().strip().split('\n') > assert len(items) == 3, 'incorrect number of items' > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types
[ https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899273#comment-16899273 ] Udi Meiri commented on BEAM-7860: - I should be possible to work around this by setting num_splits=1 in ReadFromDatastore. You'll get a SplitNotPossibleError and performance will suffer (depending on the size of the result), but it'll be correct. > v1new ReadFromDatastore returns duplicates if keys are of mixed types > - > > Key: BEAM-7860 > URL: https://issues.apache.org/jira/browse/BEAM-7860 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Affects Versions: 2.13.0 > Environment: Python 2.7 > Python 3.7 >Reporter: Niels Stender >Assignee: Udi Meiri >Priority: Major > > In the presence of mixed type keys, v1new ReadFromDatastore may return > duplicate items. The attached example returns 4 records, not the expected 3. > > {code:java} > // code placeholder > from __future__ import unicode_literals > import apache_beam as beam > from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query > from apache_beam.io.gcp.datastore.v1new import datastoreio > config = dict(project='your-google-project', namespace='test') > def test_mixed(): > keys = [ > Key(['mixed', '10038260-iperm_eservice'], **config), > Key(['mixed', 4812224868188160], **config), > Key(['mixed', '99152975-pointshop'], **config) > ] > entities = map(lambda key: Entity(key=key), keys) > with beam.Pipeline() as p: > (p > | beam.Create(entities) > | datastoreio.WriteToDatastore(project=config['project']) > ) > query = Query(kind='mixed', **config) > with beam.Pipeline() as p: > (p > | datastoreio.ReadFromDatastore(query=query, num_splits=4) > | beam.io.WriteToText('tmp.txt', num_shards=1, > shard_name_template='') > ) > items = open('tmp.txt').read().strip().split('\n') > assert len(items) == 3, 'incorrect number of items' > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types
[ https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899271#comment-16899271 ] Udi Meiri commented on BEAM-7860: - Verified that this happens in both v1 and v1new. > v1new ReadFromDatastore returns duplicates if keys are of mixed types > - > > Key: BEAM-7860 > URL: https://issues.apache.org/jira/browse/BEAM-7860 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Affects Versions: 2.13.0 > Environment: Python 2.7 > Python 3.7 >Reporter: Niels Stender >Assignee: Udi Meiri >Priority: Major > > In the presence of mixed type keys, v1new ReadFromDatastore may return > duplicate items. The attached example returns 4 records, not the expected 3. > > {code:java} > // code placeholder > from __future__ import unicode_literals > import apache_beam as beam > from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query > from apache_beam.io.gcp.datastore.v1new import datastoreio > config = dict(project='your-google-project', namespace='test') > def test_mixed(): > keys = [ > Key(['mixed', '10038260-iperm_eservice'], **config), > Key(['mixed', 4812224868188160], **config), > Key(['mixed', '99152975-pointshop'], **config) > ] > entities = map(lambda key: Entity(key=key), keys) > with beam.Pipeline() as p: > (p > | beam.Create(entities) > | datastoreio.WriteToDatastore(project=config['project']) > ) > query = Query(kind='mixed', **config) > with beam.Pipeline() as p: > (p > | datastoreio.ReadFromDatastore(query=query, num_splits=4) > | beam.io.WriteToText('tmp.txt', num_shards=1, > shard_name_template='') > ) > items = open('tmp.txt').read().strip().split('\n') > assert len(items) == 3, 'incorrect number of items' > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)