[ 
https://issues.apache.org/jira/browse/BEAM-3485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328252#comment-16328252
 ] 

Aleksandr Sosenko commented on BEAM-3485:
-----------------------------------------

Should it really be token(pk)? I've made it work when I hardcoded real primary 
key column name instead of $pk (question on Stackoverflow is mine). Is there 
some magic, which substitutes real primary key column name instead of pk?

I can't create pull request since I don't know how to elegantly retrieve 
primary key column name in context of the split function. I could make an 
additional request to Cassandra for table description, but it seams 
overcomplicated to me. Is there other way to get primary key column name in 
context of the function?

> CassandraIO.read() splitting produces invalid queries
> -----------------------------------------------------
>
>                 Key: BEAM-3485
>                 URL: https://issues.apache.org/jira/browse/BEAM-3485
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>            Reporter: Eugene Kirpichov
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>
> See 
> [https://stackoverflow.com/questions/48090668/how-to-increase-dataflow-read-parallelism-from-cassandra/48131264?noredirect=1#comment83548442_48131264]
> As the question author points out, the error is likely that token($pk) should 
> be token(pk). This was likely masked by BEAM-3424 and BEAM-3425, and the 
> splitting code path effectively was never invoked, and was broken from the 
> first PR - so there are likely other bugs.
> When testing this issue, we must ensure good code coverage in an IT against a 
> real Cassandra instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to