[jira] [Commented] (BEAM-5628) Several VcfIO tests fail in Python 3 with TypeError: cannot use a string pattern on a bytes-like object

2018-10-12 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648695#comment-16648695
 ] 

Valentyn Tymofieiev commented on BEAM-5628:
---

cc [~arostami], one of VCF IO authors.

> Several VcfIO tests fail in Python 3 with  TypeError: cannot use a string 
> pattern on a bytes-like object
> 
>
> Key: BEAM-5628
> URL: https://issues.apache.org/jira/browse/BEAM-5628
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>
> ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest)
> "
>  --
> Traceback (most recent call last):
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"",
>  line 336, in test_read_after_splitting
> ] split_records.extend(source_test_utils.read_from_source(*source_info))
> ]   File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"",
>  line 101, in read_from_source
>  for value in reader:
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 264, in read_records
>  for line in record_iterator:
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 330, in __next__
>  record = next(self._vcf_reader)
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"",
>  line 543, in __next__
>  row = self._row_pattern.split(line.rstrip())
>  TypeError: cannot use a string pattern on a bytes-like object
> "



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5628) Several VcfIO tests fail in Python 3 with TypeError: cannot use a string pattern on a bytes-like object

2018-10-12 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648694#comment-16648694
 ] 

Valentyn Tymofieiev commented on BEAM-5628:
---

Looks like VCF IO is using another package (PyVCF). We need to understand 
whether the issue is because we don't use PyVCF correctly (VCF IO issue), or 
because PyVCF itself is not Python3-compatible.

> Several VcfIO tests fail in Python 3 with  TypeError: cannot use a string 
> pattern on a bytes-like object
> 
>
> Key: BEAM-5628
> URL: https://issues.apache.org/jira/browse/BEAM-5628
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>
> ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest)
> "
>  --
> Traceback (most recent call last):
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"",
>  line 336, in test_read_after_splitting
> ] split_records.extend(source_test_utils.read_from_source(*source_info))
> ]   File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"",
>  line 101, in read_from_source
>  for value in reader:
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 264, in read_records
>  for line in record_iterator:
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 330, in __next__
>  record = next(self._vcf_reader)
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"",
>  line 543, in __next__
>  row = self._row_pattern.split(line.rstrip())
>  TypeError: cannot use a string pattern on a bytes-like object
> "



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5628) Several VcfIO tests fail in Python 3 with TypeError: cannot use a string pattern on a bytes-like object

2018-10-12 Thread Simon (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647969#comment-16647969
 ] 

Simon commented on BEAM-5628:
-

This error can be traced back to the _create_generator function (io/vcfio.py: 
line 318), where it is mentioned that PyVCF has explicit str() calls when 
parsing INFO fields, which fails with UTF-8 decoded strings. For this reason, 
the line is encoded back to UTF-8 in the python2 version. 

Because removing the encoding step results in hanging of some tests, there is a 
chance this relates to 5623.

Does anyone have additional insights?

> Several VcfIO tests fail in Python 3 with  TypeError: cannot use a string 
> pattern on a bytes-like object
> 
>
> Key: BEAM-5628
> URL: https://issues.apache.org/jira/browse/BEAM-5628
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Simon
>Priority: Major
>
> ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest)
> "
>  --
> Traceback (most recent call last):
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"",
>  line 336, in test_read_after_splitting
> ] split_records.extend(source_test_utils.read_from_source(*source_info))
> ]   File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"",
>  line 101, in read_from_source
>  for value in reader:
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 264, in read_records
>  for line in record_iterator:
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
>  line 330, in __next__
>  record = next(self._vcf_reader)
>File 
> ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"",
>  line 543, in __next__
>  row = self._row_pattern.split(line.rstrip())
>  TypeError: cannot use a string pattern on a bytes-like object
> "



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)