RE: third party service to poll Fluo for absence of event

2017-02-17 Thread Meier, Caleb
Thanks a lot Keith.  That was really helpful.  I'll trying tinkering with the 
fluo.impl.ScanTask.maxSleep property to see if I can liven things up a bit.

Caleb A. Meier, Ph.D.
Software Engineer II ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
caleb.me...@parsons.com ♦ www.parsons.com

-Original Message-
From: Keith Turner [mailto:ke...@deenlo.com] 
Sent: Thursday, February 16, 2017 6:02 PM
To: dev@fluo.incubator.apache.org
Subject: Re: third party service to poll Fluo for absence of event

On Thu, Feb 16, 2017 at 12:08 PM, Meier, Caleb <caleb.me...@parsons.com> wrote:
> Quick question.  How timely is Fluo when it comes to processing 
> notifications?  If there are enough workers, will a notification be processed 
> in a timely manner after writing to the observed column?  Earlier we had a 
> discussion about a periodic query service.
> If I write a notification to issue a periodic query to Fluo, can I expect 
> that my Observer will process that query fairly quickly (provided there are 
> enough workers)?

Fluo keeps track of the last time it saw a notification in a tablet and 
exponentially increases the scan period for that tablet when it keeps seeing 
nothing.  The increase is up to a configurable maximum.

The following code does the backoff.

  
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dfluo_blob_rel_fluo-2D1.0.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_worker_finder_hash_TabletData.java=CwIFaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=pSG3R4ixSmXTc6ylLxMgs2ZWPSv1UbQN6qOvmuscQIk=Kt-fnyzs5et-oFglhm59HR9yOWLcQYWraarJY3h1vnM=
 

The following is where it gets the max sleep time.

  
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dfluo_blob_rel_fluo-2D1.0.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_worker_finder_hash_ScanTask.java-23L72=CwIFaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=pSG3R4ixSmXTc6ylLxMgs2ZWPSv1UbQN6qOvmuscQIk=-Pp1MzlArxkS1JFEeJDP79fAOtthoQmDLl8HW-P8XsM=
 

Looking at the following, the default max sleep time for a tablet is 5
minutes.   If I expanded the constant correctly, this can be changed
by setting fluo.impl.ScanTask.maxSleep.  Note, impl properties are not part of 
the public API and are subject to change when the implementation changes.

  
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dfluo_blob_rel_fluo-2D1.0.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_impl_FluoConfigurationImpl.java-23L37=CwIFaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=pSG3R4ixSmXTc6ylLxMgs2ZWPSv1UbQN6qOvmuscQIk=GxvVrABsKcsKx_E_NPwxWqibuzTd8uSJFiOV5Saf2jE=
 

Also, when Fluo has notifications queued, it will wait till the queue size 
halves before scanning again for notifications.  So if 10,000 notifications 
were queued, then scanning for notifications would not happen until the queue 
size was 5,000 or less.  The following code shows where that happens.  I 
noticed while looking for the max scan sleep code.

  
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dfluo_blob_rel_fluo-2D1.0.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_worker_finder_hash_ScanTask.java-23L85=CwIFaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=pSG3R4ixSmXTc6ylLxMgs2ZWPSv1UbQN6qOvmuscQIk=jnFRdNJK4RxHd3qjLcWZJIQTXY1_sHjVhNEebg5GWA4=
 

>
> Caleb A. Meier, Ph.D.
> Software Engineer II ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066
> caleb.me...@parsons.com ♦ www.parsons.com
>
> -Original Message-
> From: Keith Turner [mailto:ke...@deenlo.com]
> Sent: Wednesday, February 01, 2017 11:03 PM
> To: dev@fluo.incubator.apache.org
> Subject: Re: third party service to poll Fluo for absence of event
>
> On Wed, Feb 1, 2017 at 9:54 PM, Christopher <ctubb...@apache.org> wrote:
>> On Wed, Feb 1, 2017 at 10:04 AM Meier, Caleb 
>> <caleb.me...@parsons.com>
>> wrote:
>>
>>> Yeah, this seems pretty reasonable to me.  I guess it then boils 
>>> down to the nitty gritty of do I store results in Fluo and have my 
>>> service query Fluo (I think you guys actually advise against that in 
>>> your documentation), or export results and then have the service 
>>> query some external index that I am exporting to.
>>>
>>>
>> I'm not sure we advise against it, so much as recognize that it may 
>> not be suitable for certain use cases and may not meet query 
>> performance expectations ( 
>> https://urldefense.pro

RE: third party service to poll Fluo for absence of event

2017-02-03 Thread Meier, Caleb
Thanks for the feedback everyone.  

Keith, we are currently exporting to Kafka, but I don't foresee needing to 
ingest from Kafka into Fluo for this project.  I will certainly keep you 
updated if that changes.

Thanks,

Caleb A. Meier, Ph.D.
Software Engineer II ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
caleb.me...@parsons.com ♦ www.parsons.com

-Original Message-
From: Keith Turner [mailto:ke...@deenlo.com] 
Sent: Friday, February 03, 2017 10:05 AM
To: dev@fluo.incubator.apache.org
Subject: Re: third party service to poll Fluo for absence of event

If you have time to share as you move forward, I am interested in learning from 
your experiences with using Kafka and Fluo together. I have wanted to 
experiment with this in order to see if Fluo needed any
changes to support better interoperation.   However, I have not had
the time.  I opened #795[1], but its based on speculation not experience.

[1]: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dfluo_issues_795=CwIFaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=5UoPZQJ9JMRl_7xZ5RY_Yw7cEUFMTQQLpbr341mlILw=fcHGc-fgm-SIG8XLgsQyvHOlpRVexhcGQWSlCIQETtA=
 

On Thu, Feb 2, 2017 at 5:15 PM, Meier, Caleb <caleb.me...@parsons.com> wrote:
> Thanks for the input.  I'm currently looking at creating some sort of 
> coordinator (which wraps a ScheduledExecutorService to generate periodic 
> notifications) and a collection of workers (to process the periodic queries 
> as they are issued).  Most of the interaction between the workers and 
> coordinator will be via Kafka (develop some sort of protocol to ensure that 
> more than one worker isn't getting assigned the same query).  At any rate, I 
> was thinking of implementing these components as TwillRunnables.  However, it 
> seems like the Twill documentation is a bit sparse.  Given that you guys 
> implemented Fluo as a TwillApplication, do you have any insight/advice for 
> writing TwillApplications?  In particular, how is your FluoTwillApp being 
> run?  All of the examples I've seen create a client with a TwillRunner and 
> TwillController.  It seems like you 've created your own version of a 
> YarnAppRunner -- what role is that playing in running the FluoTwillApp?  
> Moreover, it is also unclear to me whether the TwillRunnables are bound to 
> the client -- if the client terminates do the runnables terminate as well?  
> So essentially, it is unclear to me how create a long running application in 
> Twill that is not bound to a particular client.  Sorry that this is a little 
> off topic, but any help, references to documentation/examples would be very 
> appreciated.
>
> Caleb A. Meier, Ph.D.
> Software Engineer II ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066
> caleb.me...@parsons.com ♦ www.parsons.com
>
> -Original Message-
> From: Keith Turner [mailto:ke...@deenlo.com]
> Sent: Wednesday, February 01, 2017 11:03 PM
> To: dev@fluo.incubator.apache.org
> Subject: Re: third party service to poll Fluo for absence of event
>
> On Wed, Feb 1, 2017 at 9:54 PM, Christopher <ctubb...@apache.org> wrote:
>> On Wed, Feb 1, 2017 at 10:04 AM Meier, Caleb 
>> <caleb.me...@parsons.com>
>> wrote:
>>
>>> Yeah, this seems pretty reasonable to me.  I guess it then boils 
>>> down to the nitty gritty of do I store results in Fluo and have my 
>>> service query Fluo (I think you guys actually advise against that in 
>>> your documentation), or export results and then have the service 
>>> query some external index that I am exporting to.
>>>
>>>
>> I'm not sure we advise against it, so much as recognize that it may 
>> not be suitable for certain use cases and may not meet query 
>> performance expectations ( 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__fluo.apache.org_docs_fluo-2Drecipes_1.0.0-2Dincubating_export-2Dqueue_=CwIFaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=zqJSJTFo90FyUVCiF79uq3P0FHnxr0MLFKbsPsHGgyk=spmwJN_FBTO6TBBT2dne8sbE7MRMrlhz8lLPpfPZBbs=
>>  ).
>>
>
> I would advise against querying Fluo for low latency queries.
> However, this external service thats checking a few stats within Fluo and 
> injecting new notifications probably does not care about latency.
>
> The reason Fluo is not geared towards low latency is that it does lazy
> recovery of failed transactions.   Failed transactions are not cleaned
> up until something tries to read the data, which could significantly delay 
> reads.
>
>> In any case, your observer need not write the final "

RE: third party service to poll Fluo for absence of event

2017-02-02 Thread Meier, Caleb
Thanks for the input.  I'm currently looking at creating some sort of 
coordinator (which wraps a ScheduledExecutorService to generate periodic 
notifications) and a collection of workers (to process the periodic queries as 
they are issued).  Most of the interaction between the workers and coordinator 
will be via Kafka (develop some sort of protocol to ensure that more than one 
worker isn't getting assigned the same query).  At any rate, I was thinking of 
implementing these components as TwillRunnables.  However, it seems like the 
Twill documentation is a bit sparse.  Given that you guys implemented Fluo as a 
TwillApplication, do you have any insight/advice for writing TwillApplications? 
 In particular, how is your FluoTwillApp being run?  All of the examples I've 
seen create a client with a TwillRunner and TwillController.  It seems like you 
've created your own version of a YarnAppRunner -- what role is that playing in 
running the FluoTwillApp?  Moreover, it is also unclear to me whether the 
TwillRunnables are bound to the client -- if the client terminates do the 
runnables terminate as well?  So essentially, it is unclear to me how create a 
long running application in Twill that is not bound to a particular client.  
Sorry that this is a little off topic, but any help, references to 
documentation/examples would be very appreciated.

Caleb A. Meier, Ph.D.
Software Engineer II ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
caleb.me...@parsons.com ♦ www.parsons.com

-Original Message-
From: Keith Turner [mailto:ke...@deenlo.com] 
Sent: Wednesday, February 01, 2017 11:03 PM
To: dev@fluo.incubator.apache.org
Subject: Re: third party service to poll Fluo for absence of event

On Wed, Feb 1, 2017 at 9:54 PM, Christopher <ctubb...@apache.org> wrote:
> On Wed, Feb 1, 2017 at 10:04 AM Meier, Caleb <caleb.me...@parsons.com>
> wrote:
>
>> Yeah, this seems pretty reasonable to me.  I guess it then boils down 
>> to the nitty gritty of do I store results in Fluo and have my service 
>> query Fluo (I think you guys actually advise against that in your 
>> documentation), or export results and then have the service query 
>> some external index that I am exporting to.
>>
>>
> I'm not sure we advise against it, so much as recognize that it may 
> not be suitable for certain use cases and may not meet query 
> performance expectations ( 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__fluo.apache.org_docs_fluo-2Drecipes_1.0.0-2Dincubating_export-2Dqueue_=CwIFaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=zqJSJTFo90FyUVCiF79uq3P0FHnxr0MLFKbsPsHGgyk=spmwJN_FBTO6TBBT2dne8sbE7MRMrlhz8lLPpfPZBbs=
>  ).
>

I would advise against querying Fluo for low latency queries.
However, this external service thats checking a few stats within Fluo and 
injecting new notifications probably does not care about latency.

The reason Fluo is not geared towards low latency is that it does lazy
recovery of failed transactions.   Failed transactions are not cleaned
up until something tries to read the data, which could significantly delay 
reads.

> In any case, your observer need not write the final "last occurrence"
> entries into a Fluo table. It could write them anywhere.
>
>
>> Regarding timestamps, does the oracle server provide actual 
>> timestamps or just logical timestamps?  That is, could I use the 
>> timestamps that the server provides to define some sort of now() 
>> function to obtain the current time to compare with the times of incoming 
>> events?
>>
>
> Just logical time, and it delivers batches to limit locking, so it can 
> appear to jump ahead spontaneously. I'm not sure the OracleServer is 
> suitable for this purpose. What level of precision are you going for? 
> It might be enough to just run NTP, if you don't need more precision 
> than "within seconds".
>
>
>> 
>> From: Christopher <ctubb...@apache.org>
>> Sent: Tuesday, January 31, 2017 5:08 PM
>> To: dev@fluo.incubator.apache.org
>> Subject: Re: third party service to poll Fluo for absence of event
>>
>> You could write an observer which rolls up timestamps from all the 
>> events you are concerned about, and puts the most recent event 
>> timestamp into a centralized place, which you could poll. If there is 
>> no ingest of these events, then the last timestamp in this central 
>> place will exceed some threshold and the poller could detect that and 
>> trigger additional actions.
>>
>> On Tue, Jan 31, 2017 at 3:51 PM Meier, Caleb 
>> <caleb.me...@parsons.com>
>> wrote:
>>
>> > Hello,
>> >
&

Re: third party service to poll Fluo for absence of event

2017-02-01 Thread Meier, Caleb
Yeah, this seems pretty reasonable to me.  I guess it then boils down to the 
nitty gritty of do I store results in Fluo and have my service query Fluo (I 
think you guys actually advise against that in your documentation), or export 
results and then have the service query some external index that I am exporting 
to.  

Regarding timestamps, does the oracle server provide actual timestamps or just 
logical timestamps?  That is, could I use the timestamps that the server 
provides to define some sort of now() function to obtain the current time to 
compare with the times of incoming events?

From: Christopher <ctubb...@apache.org>
Sent: Tuesday, January 31, 2017 5:08 PM
To: dev@fluo.incubator.apache.org
Subject: Re: third party service to poll Fluo for absence of event

You could write an observer which rolls up timestamps from all the events
you are concerned about, and puts the most recent event timestamp into a
centralized place, which you could poll. If there is no ingest of these
events, then the last timestamp in this central place will exceed some
threshold and the poller could detect that and trigger additional actions.

On Tue, Jan 31, 2017 at 3:51 PM Meier, Caleb <caleb.me...@parsons.com>
wrote:

> Hello,
>
> I’m looking into using Fluo to develop an event based notification system
> that incrementally generates events of increasing complexity.  The one
> issue that I’m running into is how to handle the non-event event.  That is,
> Fluo (as I understand it) is not well-suited to handle the following
> request: “generate a notification if no events of a given type have
> occurred within the last 24 hours”.  This is because it is a push based
> notification framework that only generates notifications when things
> actually happen.  So the question is, has anyone looked into developing a
> service for generating notifications at regular intervals (even if
> something doesn’t happen) that works with Fluo?  I’m toying with the idea
> of creating some sort of Twill application that tells Fluo to wake up at
> regular intervals to generate a notification about the set of events
> falling within the given time window. Before doing this I just wanted to
> make sure that something like this does not already exist, and I also want
> to get a sense of how bad an idea it is to delegate some of the logic of
> this periodic notification service to Fluo.   Would it be better to
> separate out the temporal portion of my notification request to be
> processed entirely outside of Fluo to avoid transactional overhead?
>
> Caleb A. Meier, Ph.D.
> Software Engineer II ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066 <(703)%20797-3066>
> caleb.me...@parsons.com<mailto:caleb.me...@parsons.com> ♦ www.parsons.com<
> http://www.parsons.com/>
>
> --
Christopher


third party service to poll Fluo for absence of event

2017-01-31 Thread Meier, Caleb
Hello,

I’m looking into using Fluo to develop an event based notification system that 
incrementally generates events of increasing complexity.  The one issue that 
I’m running into is how to handle the non-event event.  That is, Fluo (as I 
understand it) is not well-suited to handle the following request: “generate a 
notification if no events of a given type have occurred within the last 24 
hours”.  This is because it is a push based notification framework that only 
generates notifications when things actually happen.  So the question is, has 
anyone looked into developing a service for generating notifications at regular 
intervals (even if something doesn’t happen) that works with Fluo?  I’m toying 
with the idea of creating some sort of Twill application that tells Fluo to 
wake up at regular intervals to generate a notification about the set of events 
falling within the given time window. Before doing this I just wanted to make 
sure that something like this does not already exist, and I also want to get a 
sense of how bad an idea it is to delegate some of the logic of this periodic 
notification service to Fluo.   Would it be better to separate out the temporal 
portion of my notification request to be processed entirely outside of Fluo to 
avoid transactional overhead?

Caleb A. Meier, Ph.D.
Software Engineer II ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
caleb.me...@parsons.com ♦ 
www.parsons.com



RE: debugging fluo

2016-11-01 Thread Meier, Caleb
Hey Keith,

Not seeing the worker logs in my container.  Think my logback is configured 
incorrectly...

[root@c190sv193 container_1476563020088_0069_01_02]# grep Rolling stdout
18:36:49,174 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About 
to instantiate appender of type 
[ch.qos.logback.core.rolling.RollingFileAppender]
18:36:49,215 |-INFO in 
ch.qos.logback.core.rolling.FixedWindowRollingPolicy@5ab860f4 - No compression 
will be used
18:36:49,230 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - 
Active log file name: 
io.fluo.log.dir_IS_UNDEFINED/io.fluo.log.app_IS_UNDEFINED_io.fluo.log.host_IS_UNDEFINED.log
18:36:49,230 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] - 
File property is set to 
[io.fluo.log.dir_IS_UNDEFINED/io.fluo.log.app_IS_UNDEFINED_io.fluo.log.host_IS_UNDEFINED.log]

-Original Message-
From: Keith Turner [mailto:ke...@deenlo.com] 
Sent: Tuesday, November 01, 2016 4:33 PM
To: dev@fluo.incubator.apache.org
Subject: Re: debugging fluo

Caleb,

I just ran Fluo locally with Uno.  Below is some info I am seeing.
Do you see anything about RollingFileAppender in the stdout file?

$ pwd
/home/kturner/uno/install/logs/yarn/application_1478030941494_0001/container_1478030941494_0001_01_04
$ ls
stderr  stdout  worker_1_host1.log
$ grep Rolling stdout
16:17:31,557 |-INFO in ch.qos.logback.core.joran.action.AppenderAction
- About to instantiate appender of type
[ch.qos.logback.core.rolling.RollingFileAppender]
16:17:31,574 |-INFO in
ch.qos.logback.core.rolling.FixedWindowRollingPolicy@bc1b008 - No compression 
will be used
16:17:31,583 |-INFO in
ch.qos.logback.core.rolling.RollingFileAppender[FILE] - Active log file name: 
/home/kturner/uno/install/logs/yarn/application_1478030941494_0001/container_1478030941494_0001_01_04/worker_1_host1
16:17:31,583 |-INFO in
ch.qos.logback.core.rolling.RollingFileAppender[FILE] - File property is set to 
[/home/kturner/uno/install/logs/yarn/application_1478030941494_0001/container_1478030941494_0001_01_04/worker_1_host1]

Keith

On Tue, Nov 1, 2016 at 4:22 PM, Meier, Caleb <caleb.me...@parsons.com> wrote:
> Hey Mike,
>
> So I’m not seeing any worker logs on my machines.  The search
>
> find / | grep worker
>
> yielded nothing.  Any ideas as to why these don't exist?  I'm not sure 
> that my fluo.log.dir system property is successfully being set by the 
> LogbackUtil class.
> Maybe this has something to do with it?
>
>
>
>
>
> -Original Message-
> From: Mike Walch [mailto:mwa...@apache.org]
> Sent: Tuesday, November 01, 2016 4:03 PM
> To: dev@fluo.incubator.apache.org
> Subject: Re: debugging fluo
>
> Were you able to find a worker_*.log file for each of your workers?
>
> Below are some tips for debugging:
>
> - Each YARN container should have a 'stdout' and 'stderr' file.  These files 
> may have helpful error messages.  Especially if a worker failed to start.  
> Also, any calls to System.out and System.err in your observer will be printed 
> to these files.
> - When running Fluo in YARN, Fluo must use Logback for logging (due to a hard 
> requirment by Twill). Logback is configured using 
> /path/to/fluo/conf/logback.xml.  You should review this configuration but the 
> root logger is configured by default to print any message that is the debug 
> level or higher.
> - If you configured multiple workers, each worker will run in a different 
> container and have a different worker_*.log file.
> - When a worker starts up, it prints its configuration to worker_*.log.
> Make sure that you configured your observers using the property 
> 'fluo.observer.*'
>
> -Mike
>
> On Tue, Nov 1, 2016 at 3:33 PM Meier, Caleb <caleb.me...@parsons.com> wrote:
>
>> Do you have any tips for how to make Observers log to the log files 
>> found in the directory specified by 'yarn.nodemanager.log-dirs'?
>>
>> -Original Message-
>> From: Mike Walch [mailto:mwa...@apache.org]
>> Sent: Tuesday, November 01, 2016 2:36 PM
>> To: dev@fluo.incubator.apache.org
>> Subject: Re: debugging fluo
>>
>> Hi Caleb,
>>
>> The logs for a Fluo application can be found in YARN but they are 
>> tricky to find. Fluo should have better documentation on this which I will 
>> add now.
>>
>> The easiest way to view the logs for a Fluo application is to use the 
>> web interface for the YARN resource manager ( 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8088_
>> c 
>> luster=CwIBaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdz
>> Y
>> C2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=S88pZ1xYAkVw1LehCcB3YDzFFeEKk
>> 6 mq5Tns5Aewd2s=_4PS5z_vu1bkhpZBdzJXjbGsCvMBboMqoLBIOBnRAEY=
>> ).
>> First, click on the applicatio

RE: debugging fluo

2016-11-01 Thread Meier, Caleb
Hey Mike,

So I’m not seeing any worker logs on my machines.  The search

find / | grep worker

yielded nothing.  Any ideas as to why these don't exist?  I'm not sure
that my fluo.log.dir system property is successfully being set by the 
LogbackUtil class.
Maybe this has something to do with it?


  


-Original Message-
From: Mike Walch [mailto:mwa...@apache.org] 
Sent: Tuesday, November 01, 2016 4:03 PM
To: dev@fluo.incubator.apache.org
Subject: Re: debugging fluo

Were you able to find a worker_*.log file for each of your workers?

Below are some tips for debugging:

- Each YARN container should have a 'stdout' and 'stderr' file.  These files 
may have helpful error messages.  Especially if a worker failed to start.  
Also, any calls to System.out and System.err in your observer will be printed 
to these files.
- When running Fluo in YARN, Fluo must use Logback for logging (due to a hard 
requirment by Twill). Logback is configured using 
/path/to/fluo/conf/logback.xml.  You should review this configuration but the 
root logger is configured by default to print any message that is the debug 
level or higher.
- If you configured multiple workers, each worker will run in a different 
container and have a different worker_*.log file.
- When a worker starts up, it prints its configuration to worker_*.log.
Make sure that you configured your observers using the property 
'fluo.observer.*'

-Mike

On Tue, Nov 1, 2016 at 3:33 PM Meier, Caleb <caleb.me...@parsons.com> wrote:

> Do you have any tips for how to make Observers log to the log files 
> found in the directory specified by 'yarn.nodemanager.log-dirs'?
>
> -Original Message-
> From: Mike Walch [mailto:mwa...@apache.org]
> Sent: Tuesday, November 01, 2016 2:36 PM
> To: dev@fluo.incubator.apache.org
> Subject: Re: debugging fluo
>
> Hi Caleb,
>
> The logs for a Fluo application can be found in YARN but they are 
> tricky to find. Fluo should have better documentation on this which I will 
> add now.
>
> The easiest way to view the logs for a Fluo application is to use the 
> web interface for the YARN resource manager ( 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8088_c
> luster=CwIBaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdzY
> C2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=S88pZ1xYAkVw1LehCcB3YDzFFeEKk6
> mq5Tns5Aewd2s=_4PS5z_vu1bkhpZBdzJXjbGsCvMBboMqoLBIOBnRAEY=
> ).
> First, click on the application ID (i.e application_*) of your Fluo 
> application and then click on the latest attempt ID (appattempt_*). 
> You should see a list of containers.  There should be a container for 
> the application master (typically container 1), a Fluo oracle 
> (typically container 2), and Fluo workers (containers 3+).  You can 
> view the log files produced by a container by clicking on its 'logs' 
> link.  Logs from Fluo observers will be in the worker_*.log file for 
> each of your worker containers.
>
> If you don't want to use the YARN resource manager web interface, you 
> can also view these logs in the directory specified by 
> 'yarn.nodemanager.log-dirs' of your 'yarn-site.xml' config.  This 
> method works well on one machine but on cluster your containers will 
> probably be on different machines. See the YARN documentation below 
> for more info about this property:
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__hadoop.apache.org
> _docs_r2.7.0_hadoop-2Dyarn_hadoop-2Dyarn-2Dcommon_yarn-2Ddefault.xml
> =CwIBaQ=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10=vuVdzYC2kksVZR
> 5STiFwDpzJ7CrMHCgeo_4WXTD0qo8=S88pZ1xYAkVw1LehCcB3YDzFFeEKk6mq5Tns5A
> ewd2s=SS8YTOgIAWUmBnKkHN2Eu2-h6WyEHqlNvJO-D5EKFiI=
>
> Best,
> Mike
>
> On Tue, Nov 1, 2016 at 1:29 PM Meier, Caleb <caleb.me...@parsons.com>
> wrote:
>
> Hello,
>
> I'm attempting to debug a Fluo application and am having difficulty 
> locating the logs for my observers.  I've looked within the logs for 
> hadoop-yarn, but am not seeing any logging statements for my observers.
> Where do observers log out of the box in a normal cloudera distribution?
> Do I need to do something else in addition to logging to get my 
> observers to generate logs?
>
> Thanks,
> Caleb
>


debugging fluo

2016-11-01 Thread Meier, Caleb
Hello,

I'm attempting to debug a Fluo application and am having difficulty locating 
the logs for my observers.  I've looked within the logs for hadoop-yarn, but am 
not seeing any logging statements for my observers.  Where do observers log out 
of the box in a normal cloudera distribution?  Do I need to do something else 
in addition to logging to get my observers to generate logs?

Thanks,
Caleb


oracle client not providing timestamp

2016-07-07 Thread Meier, Caleb
Hello,

I'm currently trying to create an Integration Test which uses 
AccumuloExportITBase from io.fluo.recipes.test.  I am trying to run this test 
on a Linux VM.  I currently cannot create a new Transaction from my FluoClient 
because the OracleClient
will not issue a timestamp.   My program just hangs in the loop within the 
getStamp() method in the OracleClient, spitting out the following log: Waiting 
for timestamp from Oracle. Is it running? waitTotal=7s waitPeriod=8s,...This 
goes on for a few minutes
before I finally kill the program.  Has this happened to anyone else?  Any 
suggestions on how to resolve the problem?

Thanks,
Caleb Meier

_
From: Valiyil, Puja
Sent: Thursday, July 07, 2016 12:52 PM
To: Lotts, David; Hatfield, Jesse; White, Eric; Meier, Caleb; Chilton, Kevin; 
Smith, Andrew; Good, Evan
Subject: RE: Rya Sprint Planning


Just a reminder - sprint planning today at 2 after scrum!

-Original Appointment-
From: Valiyil, Puja
Sent: Monday, January 04, 2016 12:26 PM
To: Valiyil, Puja; Lotts, David; Apsel, Steven; Hatfield, Jesse; White, Eric; 
Mihalik, Aaron; Meier, Caleb; Chilton, Kevin; Smith, Andrew
Cc: Hahn, Craig
Subject: Rya Sprint Planning
When: Thursday, July 07, 2016 2:00 PM-3:00 PM (UTC-05:00) Eastern Time (US & 
Canada).
Where:






order guarantees on event processing

2016-07-05 Thread Meier, Caleb
Hello,

We are currently using Fluo to incrementally update precomputed queries for the 
RDF triplestore Rya, which is another incubating Apache project.  We have a 
strategy in place for updating our queries as new triples are ingested, and we 
are now
working on delete support.

I'm wondering if Fluo provides any guarantees on the order in which it 
processes notifications.  Are notifications for each observer added to a queue? 
 For example, if Column C has an observer O and event e1 is written before 
event e2, is there a guarantee that O will process e1 before e2?  The reason 
I'm curious about these guarantees is that we want to address the case of one 
user adding a triple and another user deleting the same triple.  We would like 
the changes to percolate through our observers in a manner that is consistent 
with the order in which the triples are written/deleted from the triples column.

If there are no guarantees, do you have any suggestions as to how to enforce 
notification processing order?  Our initial thought is to require our 
add/delete observer to only process the event with the lowest add/delete 
timestamp.  That is, if O attempts to process e2, it checks for the event with 
the lowest timestamp (e1) and ignores e2 until it encounters and processes e1.  
This seems like it could create a huge bottleneck given that it may take a 
large number of cycles before it processes e1.  Please let me know if you need 
any additional info for added context.

Thanks,
Caleb Meier