Re: java.lang.IllegalAccessError with KafkaIO

2016-12-01 Thread Wayne Collins

Hi Max,

Here is the result from the "flink run" launcher node (devbox):
---
root@devbox:~# echo 
${HADOOP_CLASSPATH}:${HADOOP_CONF_DIR}:${YARN_CONF_DIR}:${HBASE_CONF_DIR}

:/etc/hadoop-conf:/etc/yarn-conf:
---

Here is the result from one of the Cloudera YARN nodes as root:
---
[root@hadoop0 ~]# echo 
${HADOOP_CLASSPATH}:${HADOOP_CONF_DIR}:${YARN_CONF_DIR}:${HBASE_CONF_DIR}

:::
---

Here is the result from one of the Cloudera YARN nodes as yarn:
---
[yarn@hadoop0 ~]$ echo 
${HADOOP_CLASSPATH}:${HADOOP_CONF_DIR}:${YARN_CONF_DIR}:${HBASE_CONF_DIR}

:::
---


Note that both the yarn-session.sh and the flink run commands are run as 
root on devbox.


Software version details:
devbox has these versions of the client software:
flink-1.1.2
hadoop-2.6.0
kafka_2.11-0.9.0.1
(also reproduced the problem with kafka_2.10-0.9.0.1)

The cluster (providing YARN) is:
CDH5 - 5.8.2-1.cdh5.8.2.p0.3 (Hadoop 2.6.0)
Kafka - 2.0.2-1.2.0.2.p0.5 (Kafka 0.9.0)

Thanks for your help!
Wayne



On 2016-12-01 12:54 PM, Maximilian Michels wrote:

What is the output of the following on the nodes? I have a suspision
that something sneaks in from one of the classpath variables that
Flink picks up:

echo ${HADOOP_CLASSPATH}:${HADOOP_CONF_DIR}:${YARN_CONF_DIR}:${HBASE_CONF_DIR}

On Tue, Nov 29, 2016 at 9:17 PM, Wayne Collins  wrote:

Hi Max,

I rebuilt my sandbox with Beam 0.3.0-incubating and Flink 1.1.2 and I'm
still seeing the following error message with the StreamWordCount demo code:

Caused by: java.lang.IllegalAccessError: tried to access method
com.google.common.base.Optional.()V from class
com.google.common.base.Absent
 at com.google.common.base.Absent.(Absent.java:35)
 at com.google.common.base.Absent.(Absent.java:33)
 at sun.misc.Unsafe.ensureClassInitialized(Native Method)
...



(snip)


Re: Event-time based in-window trigger

2016-12-01 Thread Lukasz Cwik
Can you provide more details about the problem your trying to solve with
some examples showing input and the expected output?




On Wed, Nov 30, 2016 at 11:08 PM, Manu Zhang 
wrote:

> Hi,
>
> Recently I’m addressing a problem where users want to trigger after
> watermark past each element (i.e. in the middle of event-time window). I
> fail to find an existing trigger that does so. Any idea on model this
> problem with Beam ?
>
> Thanks,
> Manu Zhang
>


Re: Event-time based in-window trigger

2016-12-01 Thread Manu Zhang
My use case is to track user trajectory based on page view event when they
visit a website.  The input would be like a list of PageView(userId, url,
eventTimestamp) with watermarks (= eventTimestamp - duration). I'm trying
Sessions with event time trigger. Note we can't wait for the end of session
window due to latency requirement. Instead, we want to emit the user
trajectories whenever a buffered PageView's event time is passed by
watermark.

On Fri, Dec 2, 2016 at 5:41 AM Lukasz Cwik  wrote:

> Can you provide more details about the problem your trying to solve with
> some examples showing input and the expected output?
>
>
>
>
> On Wed, Nov 30, 2016 at 11:08 PM, Manu Zhang 
> wrote:
>
> Hi,
>
> Recently I’m addressing a problem where users want to trigger after
> watermark past each element (i.e. in the middle of event-time window). I
> fail to find an existing trigger that does so. Any idea on model this
> problem with Beam ?
>
> Thanks,
> Manu Zhang
>
>
>


Re: Event-time based in-window trigger

2016-12-01 Thread Kenneth Knowles
Thanks for laying out some details.

On Thu, Dec 1, 2016 at 7:09 PM, Manu Zhang  wrote:
>
> Yes, the difficulty is to define that trigger. The existing triggers fire
> at the end of window. (I could be mistaken, which will be good news)
>

You are not mistaken that the only existing event time trigger is the one
that fires at the end of the window. The trigger you describe would be a
new primitive trigger. It fits with the design, if we ensure monotonicity,
etc. Actually implementing it in the backend is easy, of course. We
actually had something like it, but didn't quite nail it down so we removed
it until we had a solid use case and design for it.

B and C which are not mutually exclusive
> More on my use case. Say a user visits http://foo at 1, http://foo/bar at
> 4 and back to http://foo at 5 all in a Session
> we would want to emit
>
> http://foo  when the watermark passes 1
> http://foo -> http://foo/bar when the watermark passes 4
> http://foo -> http://foo/bar -> http://foo when the watermark passes 5
>

What would you want to emit when the watermark jumps from 0 to 7 and all
three of the above are buffered?

What would you want to emit when the watermark was at 9 and
http://foo/bizzle came in with timestamp 3?

Kenn


>
>
> On Fri, Dec 2, 2016 at 10:12 AM Ben Chambers  wrote:
>
>> As a clarifying question:
>>
>> If you have three elements in the pane with timestamps [1, 4, 5], would
>> you:
>> A. want to emit that entire pane when the watermark passes 1
>> B. want to emit that entire pane when the watermark passes 5
>> C. emit a fragment of that pane containing only the first element when
>> the watermark passes 1
>>
>> On Thu, Dec 1, 2016 at 6:01 PM Tyler Akidau  wrote:
>>
>> So what you want is essentially a trigger that fires when the watermark
>> has passed the event time of the oldest un-emitted element in the current
>> pane? You could them presumably wrap this in a repeat to get the overall
>> desired semantics, right?
>>
>> -Tyler
>>
>>
>> On Fri, Dec 2, 2016 at 7:32 AM Manu Zhang 
>> wrote:
>>
>> My use case is to track user trajectory based on page view event when
>> they visit a website.  The input would be like a list of
>> PageView(userId, url, eventTimestamp) with watermarks (= eventTimestamp -
>> duration). I'm trying Sessions with event time trigger. Note we can't wait
>> for the end of session window due to latency requirement. Instead, we want
>> to emit the user trajectories whenever a buffered PageView's event time is
>> passed by watermark.
>>
>> On Fri, Dec 2, 2016 at 5:41 AM Lukasz Cwik  wrote:
>>
>> Can you provide more details about the problem your trying to solve with
>> some examples showing input and the expected output?
>>
>>
>>
>>
>> On Wed, Nov 30, 2016 at 11:08 PM, Manu Zhang 
>> wrote:
>>
>> Hi,
>>
>> Recently I’m addressing a problem where users want to trigger after
>> watermark past each element (i.e. in the middle of event-time window). I
>> fail to find an existing trigger that does so. Any idea on model this
>> problem with Beam ?
>>
>> Thanks,
>> Manu Zhang
>>
>>
>>


Re: Event-time based in-window trigger

2016-12-01 Thread Tyler Akidau
And one more question while we're at it: what if you have events happening
every second within the window? Do you really want to emit a new pane every
second as the watermark progresses (assuming it progresses relatively
smoothly)? What if we're talking differences of event times of
milliseconds? Is one pane per millisecond what you want?

-Tyler

On Fri, Dec 2, 2016 at 10:41 AM Kenneth Knowles  wrote:

> Thanks for laying out some details.
>
> On Thu, Dec 1, 2016 at 7:09 PM, Manu Zhang 
> wrote:
>
> Yes, the difficulty is to define that trigger. The existing triggers fire
> at the end of window. (I could be mistaken, which will be good news)
>
>
> You are not mistaken that the only existing event time trigger is the one
> that fires at the end of the window. The trigger you describe would be a
> new primitive trigger. It fits with the design, if we ensure monotonicity,
> etc. Actually implementing it in the backend is easy, of course. We
> actually had something like it, but didn't quite nail it down so we removed
> it until we had a solid use case and design for it.
>
> B and C which are not mutually exclusive
> More on my use case. Say a user visits http://foo at 1, http://foo/bar at
> 4 and back to http://foo at 5 all in a Session
> we would want to emit
>
> http://foo  when the watermark passes 1
> http://foo -> http://foo/bar when the watermark passes 4
> http://foo -> http://foo/bar -> http://foo when the watermark passes 5
>
>
> What would you want to emit when the watermark jumps from 0 to 7 and all
> three of the above are buffered?
>
> What would you want to emit when the watermark was at 9 and
> http://foo/bizzle came in with timestamp 3?
>
> Kenn
>
>
>
>
> On Fri, Dec 2, 2016 at 10:12 AM Ben Chambers  wrote:
>
> As a clarifying question:
>
> If you have three elements in the pane with timestamps [1, 4, 5], would
> you:
> A. want to emit that entire pane when the watermark passes 1
> B. want to emit that entire pane when the watermark passes 5
> C. emit a fragment of that pane containing only the first element when the
> watermark passes 1
>
> On Thu, Dec 1, 2016 at 6:01 PM Tyler Akidau  wrote:
>
> So what you want is essentially a trigger that fires when the watermark
> has passed the event time of the oldest un-emitted element in the current
> pane? You could them presumably wrap this in a repeat to get the overall
> desired semantics, right?
>
> -Tyler
>
>
> On Fri, Dec 2, 2016 at 7:32 AM Manu Zhang  wrote:
>
> My use case is to track user trajectory based on page view event when they
> visit a website.  The input would be like a list of PageView(userId, url,
> eventTimestamp) with watermarks (= eventTimestamp - duration). I'm trying
> Sessions with event time trigger. Note we can't wait for the end of session
> window due to latency requirement. Instead, we want to emit the user
> trajectories whenever a buffered PageView's event time is passed by
> watermark.
>
> On Fri, Dec 2, 2016 at 5:41 AM Lukasz Cwik  wrote:
>
> Can you provide more details about the problem your trying to solve with
> some examples showing input and the expected output?
>
>
>
>
> On Wed, Nov 30, 2016 at 11:08 PM, Manu Zhang 
> wrote:
>
> Hi,
>
> Recently I’m addressing a problem where users want to trigger after
> watermark past each element (i.e. in the middle of event-time window). I
> fail to find an existing trigger that does so. Any idea on model this
> problem with Beam ?
>
> Thanks,
> Manu Zhang
>
>
>


Re: Event-time based in-window trigger

2016-12-01 Thread Manu Zhang
@Tyler,
Yes, the difficulty is to define that trigger. The existing triggers fire
at the end of window. (I could be mistaken, which will be good news)

@Ben,
B and C which are not mutually exclusive
More on my use case. Say a user visits http://foo at 1, http://foo/bar at 4
and back to http://foo at 5 all in a Session
we would want to emit

http://foo  when the watermark passes 1
http://foo -> http://foo/bar when the watermark passes 4
http://foo -> http://foo/bar -> http://foo when the watermark passes 5


On Fri, Dec 2, 2016 at 10:12 AM Ben Chambers  wrote:

> As a clarifying question:
>
> If you have three elements in the pane with timestamps [1, 4, 5], would
> you:
> A. want to emit that entire pane when the watermark passes 1
> B. want to emit that entire pane when the watermark passes 5
> C. emit a fragment of that pane containing only the first element when the
> watermark passes 1
>
> On Thu, Dec 1, 2016 at 6:01 PM Tyler Akidau  wrote:
>
> So what you want is essentially a trigger that fires when the watermark
> has passed the event time of the oldest un-emitted element in the current
> pane? You could them presumably wrap this in a repeat to get the overall
> desired semantics, right?
>
> -Tyler
>
>
> On Fri, Dec 2, 2016 at 7:32 AM Manu Zhang  wrote:
>
> My use case is to track user trajectory based on page view event when they
> visit a website.  The input would be like a list of PageView(userId, url,
> eventTimestamp) with watermarks (= eventTimestamp - duration). I'm trying
> Sessions with event time trigger. Note we can't wait for the end of session
> window due to latency requirement. Instead, we want to emit the user
> trajectories whenever a buffered PageView's event time is passed by
> watermark.
>
> On Fri, Dec 2, 2016 at 5:41 AM Lukasz Cwik  wrote:
>
> Can you provide more details about the problem your trying to solve with
> some examples showing input and the expected output?
>
>
>
>
> On Wed, Nov 30, 2016 at 11:08 PM, Manu Zhang 
> wrote:
>
> Hi,
>
> Recently I’m addressing a problem where users want to trigger after
> watermark past each element (i.e. in the middle of event-time window). I
> fail to find an existing trigger that does so. Any idea on model this
> problem with Beam ?
>
> Thanks,
> Manu Zhang
>
>
>


Re: Event-time based in-window trigger

2016-12-01 Thread Manu Zhang
@Kenn,

1. when the watermark jumps from 0 to 7,  http://foo -> http://foo/bar ->
http://foo will be emitted
We can emit events with timestamps before watermark in the pane
2. http://foo -> http://foo/bizzle -> http://foo/bar -> http://foo will be
emitted if it's within the allowed lateness
which Beam already allows us to do.

To elaborate on the use case, when users are visiting Amazon we want to
offer them best recommendations.
Thus, we would like to know what leads to their final decision and track
the pages they visit until clicking the "Add to cart" button.
It will be too late if we only send the results when they finish shopping.

@Tyler,
I don't think it's likely to happen for my use case. Think about a user
jumping between pages like crazy. Meanwhile, we can control how fast
watermark progresses as long as it meets the latency requirement.




On Fri, Dec 2, 2016 at 11:45 AM Tyler Akidau  wrote:

> And one more question while we're at it: what if you have events happening
> every second within the window? Do you really want to emit a new pane every
> second as the watermark progresses (assuming it progresses relatively
> smoothly)? What if we're talking differences of event times of
> milliseconds? Is one pane per millisecond what you want?
>
> -Tyler
>
> On Fri, Dec 2, 2016 at 10:41 AM Kenneth Knowles  wrote:
>
> Thanks for laying out some details.
>
> On Thu, Dec 1, 2016 at 7:09 PM, Manu Zhang 
> wrote:
>
> Yes, the difficulty is to define that trigger. The existing triggers fire
> at the end of window. (I could be mistaken, which will be good news)
>
>
> You are not mistaken that the only existing event time trigger is the one
> that fires at the end of the window. The trigger you describe would be a
> new primitive trigger. It fits with the design, if we ensure monotonicity,
> etc. Actually implementing it in the backend is easy, of course. We
> actually had something like it, but didn't quite nail it down so we removed
> it until we had a solid use case and design for it.
>
> B and C which are not mutually exclusive
> More on my use case. Say a user visits http://foo at 1, http://foo/bar at
> 4 and back to http://foo at 5 all in a Session
> we would want to emit
>
> http://foo  when the watermark passes 1
> http://foo -> http://foo/bar when the watermark passes 4
> http://foo -> http://foo/bar -> http://foo when the watermark passes 5
>
>
> What would you want to emit when the watermark jumps from 0 to 7 and all
> three of the above are buffered?
>
> What would you want to emit when the watermark was at 9 and
> http://foo/bizzle came in with timestamp 3?
>
> Kenn
>
>
>
>
> On Fri, Dec 2, 2016 at 10:12 AM Ben Chambers  wrote:
>
> As a clarifying question:
>
> If you have three elements in the pane with timestamps [1, 4, 5], would
> you:
> A. want to emit that entire pane when the watermark passes 1
> B. want to emit that entire pane when the watermark passes 5
> C. emit a fragment of that pane containing only the first element when the
> watermark passes 1
>
> On Thu, Dec 1, 2016 at 6:01 PM Tyler Akidau  wrote:
>
> So what you want is essentially a trigger that fires when the watermark
> has passed the event time of the oldest un-emitted element in the current
> pane? You could them presumably wrap this in a repeat to get the overall
> desired semantics, right?
>
> -Tyler
>
>
> On Fri, Dec 2, 2016 at 7:32 AM Manu Zhang  wrote:
>
> My use case is to track user trajectory based on page view event when they
> visit a website.  The input would be like a list of PageView(userId, url,
> eventTimestamp) with watermarks (= eventTimestamp - duration). I'm trying
> Sessions with event time trigger. Note we can't wait for the end of session
> window due to latency requirement. Instead, we want to emit the user
> trajectories whenever a buffered PageView's event time is passed by
> watermark.
>
> On Fri, Dec 2, 2016 at 5:41 AM Lukasz Cwik  wrote:
>
> Can you provide more details about the problem your trying to solve with
> some examples showing input and the expected output?
>
>
>
>
> On Wed, Nov 30, 2016 at 11:08 PM, Manu Zhang 
> wrote:
>
> Hi,
>
> Recently I’m addressing a problem where users want to trigger after
> watermark past each element (i.e. in the middle of event-time window). I
> fail to find an existing trigger that does so. Any idea on model this
> problem with Beam ?
>
> Thanks,
> Manu Zhang
>
>
>


Re: Event-time based in-window trigger

2016-12-01 Thread Tyler Akidau
So what you want is essentially a trigger that fires when the watermark has
passed the event time of the oldest un-emitted element in the current pane?
You could them presumably wrap this in a repeat to get the overall desired
semantics, right?

-Tyler


On Fri, Dec 2, 2016 at 7:32 AM Manu Zhang  wrote:

> My use case is to track user trajectory based on page view event when they
> visit a website.  The input would be like a list of PageView(userId, url,
> eventTimestamp) with watermarks (= eventTimestamp - duration). I'm trying
> Sessions with event time trigger. Note we can't wait for the end of session
> window due to latency requirement. Instead, we want to emit the user
> trajectories whenever a buffered PageView's event time is passed by
> watermark.
>
> On Fri, Dec 2, 2016 at 5:41 AM Lukasz Cwik  wrote:
>
> Can you provide more details about the problem your trying to solve with
> some examples showing input and the expected output?
>
>
>
>
> On Wed, Nov 30, 2016 at 11:08 PM, Manu Zhang 
> wrote:
>
> Hi,
>
> Recently I’m addressing a problem where users want to trigger after
> watermark past each element (i.e. in the middle of event-time window). I
> fail to find an existing trigger that does so. Any idea on model this
> problem with Beam ?
>
> Thanks,
> Manu Zhang
>
>
>


Re: Support for reading avro files from HDFS?

2016-12-01 Thread Dan Halperin
Hi Rico,

As a short-term workaround, you should also be able to use the
HadoopFileSource with AvroInputFormat.
https://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/mapred/AvroInputFormat.html

On Mon, Nov 21, 2016 at 3:34 AM, Amit Sela  wrote:

> Hi Rico,
>
> We're working on supporting different IOChannelFactory implementations.
> HDFS is definitely at the top of the list.
> For now it is supported only with HdfsIO but that won't help you with
> using AvroIO or TextIO on top of HDFS.
>
> You can follow the progress in the ticket BEAM-59
> .
> And here
> 's
> a design proposal.
>
> I'll make sure we properly announce this as well once it's in.
>
> Thanks!
>
> On Mon, Nov 21, 2016 at 1:29 PM Bergmann, Rico (GfK External) <
> rico.bergm...@ext.gfk.com> wrote:
>
> Hi!
>
>
>
> I was using the AvroIO.Read to read am avro file (snappy encoded) from
> HDFS. But the IOChannelFactory has per default no handler for hdfs
> resources. Is there any and if so how can I usew it?
>
>
>
> Best, Rico
>
> --
>
>
> GfK SE, Nuremberg, Germany, commercial register at the local court
> Amtsgericht Nuremberg HRB 25014; Management Board: Dr. Gerhard
> Hausruckinger (Speaker of the Management Board), Christian Diedrich (CFO),
> Matthias Hartmann, David Krajicek, Alessandra Cama; Chairman of the
> Supervisory Board: Ralf Klein-Bölting This email and any attachments may
> contain confidential or privileged information. Please note that
> unauthorized copying, disclosure or distribution of the material in this
> email is not permitted.
>
>