Re: BigQuery table backed by Google Sheet

2017-05-04 Thread Davor Bonaci
This sounds like a Dataflow-specific question, so it might be better
addressed by a StackOverflow question tagged with google-cloud-dataflow [1].

I'd suggest checking the Dataflow documentation for more information about
its security model [2]. You'd likely have to use a service account, as
described there for a few scenarios.

Hope this helps, but please don't hesitate to post on StackOverflow for any
Dataflow-specific questions.

Davor

[1] http://stackoverflow.com/questions/tagged/google-cloud-dataflow
[2] https://cloud.google.com/dataflow/security-and-permissions

On Wed, May 3, 2017 at 11:08 PM, Prabeesh K.  wrote:

> Hi Anil,
>
> Thank you for your attempt to answer the question. This is the broad
> answer.
>
> The problem is we should add additional scope to Beam to read google sheet
> and add the permission to the service account to read the google sheet.
>
> Regards,
> Prabeesh K.
>
>
> On 4 May 2017 at 08:58, Anil Srinivas 
> wrote:
>
>> Hi,
>> The problem with it as what I see is that you don't have the permissions
>> to access the data in the BigQuery table. Make sure you login into the
>> account which has permissions for reading/writing data in the table.
>>
>>
>>
>>
>> Regards,
>> Anil
>>
>>
>> On Wed, May 3, 2017 at 6:21 PM, Prabeesh K.  wrote:
>>
>>> Hi,
>>>
>>> How to we can read a BigQuery table that backed by google sheet?
>>>
>>> For me, I am getting the following error.
>>>
>>> "error": {
>>>   "errors": [
>>>{
>>> "domain": "global",
>>> "reason": "accessDenied",
>>> "message": "Access Denied: BigQuery BigQuery: Permission denied
>>> while globbing file pattern.",
>>> "locationType": "other",
>>> "location": "/gdrive/id/--"
>>>}
>>>   ],
>>>   "code": 403,
>>>   "message": "Access Denied: BigQuery BigQuery: Permission denied while
>>> globbing file pattern."
>>>  }
>>> }
>>>
>>>
>>> Help to fix this issue.
>>>
>>>
>>> Regards,
>>> Prabeesh K.
>>>
>>>
>>>
>>
>


Re: Slack channel invite pls

2017-05-04 Thread Parker Coleman
Can I also get a slack invite?  adinsx...@gmail.com

On Thu, May 4, 2017 at 9:36 AM, Lukasz Cwik  wrote:

> Sent
>
> On Thu, May 4, 2017 at 9:32 AM, Seshadri Raghunathan 
> wrote:
>
>> Hi ,
>>
>> Please add me to the Slack channel in the next possible cycle.
>>
>> sesh...@gmail.com
>>
>> Thanks,
>> Seshadri
>>
>
>


Re: KafkaIO nothing received?

2017-05-04 Thread Mingmin Xu
@Conrad,

Your code should be good to go, I can run it in my local env. There're two
points you may have a check:
1). does the topic have data there, you can confirm with kafka cli '
*bin/kafka-console-consumer.sh*';
2). is the port in bootstrapServers right? By default it's 9092.



On Thu, May 4, 2017 at 9:05 AM, Conrad Crampton  wrote:

> Hi,
>
> New to the group – ‘hello’!
>
>
>
> Just starting to look into Beam and I very much like the concepts, but
> have rather fallen at the first hurdle – that being trying to subscribe to
> a kafka topic and process results.
>
> Very simply the following code doesn’t get receive any records (the data
> is going into the queue) – I just get nothing.
>
> I have tried on both direct-runner and flink-runner (using the Quickstart
> as a base for options, mvn profile etc.)
>
>
>
> Code
>
>
>
> Pipeline p = Pipeline.*create*(options);
>
> List topics = ImmutableList.*of*(*"test-http-logs-json"*);
>
>
> PCollection logs = p.apply(KafkaIO.*read*()
> .withBootstrapServers(
> *"datanode2-cm1.mis-cds.local:6667,datanode3-cm1.mis-cds.local:6667,datanode6-cm1.mis-cds.local:6667"*
> )
> .withTopics(topics)
> .withKeyCoder(StringUtf8Coder.*of*())
> .withValueCoder(StringUtf8Coder.*of*())
> .withMaxNumRecords(10)
> .updateConsumerProperties(ImmutableMap.*builder*()
> .put(*"auto.offset.reset"*, (Object) *"earliest"*)
> .put(*"group.id "*, (Object)
> *"http-logs-beam-json"*)
> .put(*"enable.auto.commit"*, (Object) *"true"*)
> .put(*"receive.buffer.bytes"*, 1024 * 1024)
> .build())
>
> *// set a Coder for Key and Value *.withoutMetadata())
> .apply(*"Transform "*, MapElements.*via*(*new 
> *SimpleFunction String>, String>() {
> @Override
> *public *String apply(KV input) {
> *log*.debug(*"{}"*, input.getValue());
> *return *input.getKey() + *" " *+ input.getValue();
> }
> }));
>
>
> p.run();
>
>
>
>
>
> Result:
>
> May 04, 2017 5:02:13 PM org.apache.kafka.common.config.AbstractConfig
> logAll
>
> INFO: ConsumerConfig values:
>
> metric.reporters = []
>
> metadata.max.age.ms = 30
>
> value.deserializer = class org.apache.kafka.common.serialization.
> ByteArrayDeserializer
>
> group.id = http-logs-beam-json
>
> partition.assignment.strategy = [org.apache.kafka.clients.
> consumer.RangeAssignor]
>
> reconnect.backoff.ms = 50
>
> sasl.kerberos.ticket.renew.window.factor = 0.8
>
> max.partition.fetch.bytes = 1048576
>
> bootstrap.servers = [datanode2-cm1.mis-cds.local:6667,
> datanode3-cm1.mis-cds.local:6667, datanode6-cm1.mis-cds.local:6667]
>
> retry.backoff.ms = 100
>
> sasl.kerberos.kinit.cmd = /usr/bin/kinit
>
> sasl.kerberos.service.name = null
>
> sasl.kerberos.ticket.renew.jitter = 0.05
>
> ssl.keystore.type = JKS
>
> ssl.trustmanager.algorithm = PKIX
>
> enable.auto.commit = true
>
> ssl.key.password = null
>
> fetch.max.wait.ms = 500
>
> sasl.kerberos.min.time.before.relogin = 6
>
> connections.max.idle.ms = 54
>
> ssl.truststore.password = null
>
> session.timeout.ms = 3
>
> metrics.num.samples = 2
>
> client.id =
>
> ssl.endpoint.identification.algorithm = null
>
> key.deserializer = class org.apache.kafka.common.serialization.
> ByteArrayDeserializer
>
> ssl.protocol = TLS
>
> check.crcs = true
>
> request.timeout.ms = 4
>
>ssl.provider = null
>
> ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>
> ssl.keystore.location = null
>
> heartbeat.interval.ms = 3000
>
> auto.commit.interval.ms = 5000
>
> receive.buffer.bytes = 1048576
>
> ssl.cipher.suites = null
>
> ssl.truststore.type = JKS
>
> security.protocol = PLAINTEXT
>
> ssl.truststore.location = null
>
> ssl.keystore.password = null
>
> ssl.keymanager.algorithm = SunX509
>
> metrics.sample.window.ms = 3
>
> fetch.min.bytes = 1
>
> send.buffer.bytes = 131072
>
> auto.offset.reset = earliest
>
>
>
> May 04, 2017 5:02:13 PM org.apache.kafka.common.utils.AppInfoParser$AppInfo
> 
>
> INFO: Kafka version : 0.9.0.1
>
> May 04, 2017 5:02:13 PM org.apache.kafka.common.utils.AppInfoParser$AppInfo
> 
>
> INFO: Kafka commitId : 23c69d62a0cabf06
>
> May 04, 2017 5:02:13 PM 
> org.apache.beam.sdk.io.kafka.KafkaIO$UnboundedKafkaSource
> generateInitialSplits
>
> INFO: Partitions assigned to split 0 (total 1): test-http-logs-json-0
>
> May 04, 2017 5:02:13 PM org.apache.kafka.common.config.AbstractConfig
> logAll
>
> INFO: ConsumerConfig 

Re: Http sink - Does it make sense?

2017-05-04 Thread Seshadri Raghunathan
Hi Josh,

If you are looking for RestIO, I saw some discussions earlier in the group.
This JIRA has more details on it -
https://issues.apache.org/jira/browse/BEAM-1946

Hope that helps !

Regards,
Seshadri

On Thu, May 4, 2017 at 9:19 AM, Josh  wrote:

> Hi all,
>
> I'm computing some counts in a Beam job and want to sink the results to a
> REST API via HTTP. For example, group incoming elements by a key, count by
> key, and then every minute write the count to an API which supports
> incremental updates.
>
> Is this something that other people have done with Beam? I was unable to
> find any examples of an Http sink online. If I write my own custom sink to
> do this, is there anything to be wary of?
>
> Thanks for any advice,
> Josh
>


Http sink - Does it make sense?

2017-05-04 Thread Josh
Hi all,

I'm computing some counts in a Beam job and want to sink the results to a
REST API via HTTP. For example, group incoming elements by a key, count by
key, and then every minute write the count to an API which supports
incremental updates.

Is this something that other people have done with Beam? I was unable to
find any examples of an Http sink online. If I write my own custom sink to
do this, is there anything to be wary of?

Thanks for any advice,
Josh


Re: Slack access?

2017-05-04 Thread xumingmingv
invite sent.Sent from your iPhone-- Original --From: Conrad Crampton  Date: 周五,5月 5,2017 00:06To: user@beam.apache.org Subject: Re: Slack access?









Hi, 
Can I be added to the Slack channel please when next updated?
 
conrad.cramp...@secdata.com
 
Many thanks
Conrad


SecureData, combating cyber threats



The information contained in this message or any of its attachments may be privileged and confidential and intended for the exclusive use of the intended recipient. If you are not the intended recipient any disclosure, reproduction, distribution or other dissemination or use of this communications is strictly prohibited. The views expressed in this email are those of the individual and not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if followed up by a formal written quote.
SecureData Europe Limited. Registered in England & Wales 04365896. Registered Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, ME16 9NT




Re: [HEADS UP] Using "new" filesystem layer

2017-05-04 Thread Lukasz Cwik
JB, for your second point it seems as though you may not be setting the
Hadoop configuration on HadoopFileSystemOptions.
Also, I just merged https://github.com/apache/beam/pull/2890 which will
auto detect Hadoop configuration based upon your HADOOP_CONF_DIR and
YARN_CONF_DIR environment variables.

On Thu, May 4, 2017 at 8:58 AM, Jean-Baptiste Onofré 
wrote:

> Hi guys,
>
> One of key refactoring/new feature we bring in the first stable release is
> the "new" Beam filesystems.
>
> I started to play with it on couple of use cases I have in beam-samples.
>
> 1/ TextIO.write() with unbounded PCollection (stream)
>
> The first use case is the TextIO write with unbounded PCollection (good
> timing as we had a question yesterday about this on Slack).
>
> I confirm that TextIO now supports unbounded PCollection. You have to
> create a Window and "flag" TextIO to use windowing.
>
> Here's the code snippet:
>
> pipeline
>
> .apply(JmsIO.read().withConnectionFactory(connectionFactory)
> .withQueue("BEAM"))
> .apply(MapElements.via(new SimpleFunction String>() {
> public String apply(JmsRecord input) {
> return input.getPayload();
> }
> }))
>
> .apply(Window.into(FixedWindows.of(Duration.standardSeconds(10
> .apply(TextIO.write()
> .to("/home/jbonofre/demo/beam/output/uc2")
> .withWindowedWrites()
> .withNumShards(3));
>
> Thanks to Dan, I found an issue in the watermark of JmsIO (as it uses the
> JMS ack to advance the watermark, it should not be auto but client ack).
> I'm preparing a PR for JmsIO about this.
> However the "windowed" TextIO works fine.
>
> 2/ Beam HDFS filesystem
>
> The other use case is to use the "new" Beam filesystem with TextIO,
> especially HDFS.
>
> So, in my pipeline, I'm using:
>
> .apply(TextIO.write().to("hdfs://localhost/home/jbonofre/
> demo/beam/output/uc1"));
>
> In my pom.xml, I define both Beam hadoop-file-system and hadoop-client
> dependencies:
>
> 
> org.apache.beam
> beam-sdks-java-io-hadoop-file-system
> 0.7.0-SNAPSHOT
> 
> 
> org.apache.hadoop
> hadoop-client
> 2.7.3
> 
>
> Unfortunately, when starting the pipeline, I have:
>
> Exception in thread "main" java.lang.IllegalStateException: Unable to
> find registrar for hdfs
> at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(Fil
> eSystems.java:427)
> at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSyst
> ems.java:494)
> at org.apache.beam.sdk.io.FileBasedSink.convertToFileResourceIf
> Possible(FileBasedSink.java:193)
> at org.apache.beam.sdk.io.TextIO$Write.to(TextIO.java:292)
> at org.apache.beam.samples.data.ingestion.JdbcToHdfs.main(JdbcT
> oHdfs.java:39)
>
> I gonna investigate tonight and I will let you know.
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Slack access?

2017-05-04 Thread Conrad Crampton
Hi,
Can I be added to the Slack channel please when next updated?

conrad.cramp...@secdata.com

Many thanks
Conrad


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT