Re: Operator checkpointing not working

2018-06-18 Thread Thomas Weise
The default checkpoint interval is 30s and the interval between failing aggregators is approximately 10s? In that case, no state will ever get checkpointed and operator reset to initial state. Thomas -- sent from mobile On Mon, Jun 18, 2018, 12:45 PM Mateusz Zakarczemny wrote: > Hi Pramod, >

Re: Recommended way to query hive and capture output

2017-10-23 Thread Thomas Weise
Have you explored using the JDBC interface to read from Hive? Apex has JDBC connectors and it should be possible to use them with Hive. Thanks, Thomas On Mon, Oct 23, 2017 at 11:52 AM, Vivek Bhide wrote: > Hi > > I would like to know if there is any recommended way to

Re: how to find application master URL ( non proxy )

2017-09-26 Thread Thomas Weise
setting log levels is one of > the desirable feature. The limitation on prod environment is that we don't > access to all the workers nodes where potentially the AppMaster could be > running and GUI runs from edge node. > > Sunil Parmar > > On Tue, Sep 26, 2017 at 1:23 PM, Tho

Re: Malhar input operator for connecting to Teradata

2017-08-28 Thread Thomas Weise
The JDBC poll operator uses a query translator that does not support all dialects in the OS version. For a one-off with Oracle, the following override for post processing did the trick: @Override protected String buildRangeQuery(Object key, int offset, int limit) { String

Re: How the application recovery works when its started with -originalAppId

2017-08-10 Thread Thomas Weise
There are couple bugs that were recently identified that look related to this: https://issues.apache.org/jira/browse/APEXMALHAR-2526 https://issues.apache.org/jira/browse/APEXCORE-767 Perhaps the fix for first item is what you need? Thomas On Thu, Aug 10, 2017 at 8:41 PM, Pramod Immaneni

Re: How to consume from two different topics in one apex application

2017-08-03 Thread Thomas Weise
I don't think you can use both Kafka client 0.8.x and 0.9.x within the same application. The dependencies overlap and will conflict. You can use 0.8 client to talk to 0.9 server but since you want to use SSL that's not possible (security was only added in 0.9). I have not tried that, but you

Re: Apache Apex with Apache slider

2017-08-01 Thread Thomas Weise
Sounds like what you want is a Slider application that monitors the Apex application? Yes, that would be possible. The Slider app/script could use the RM to locate the app and check the YARN status and then through the Apex AM REST API to poll the Apex metrics. Thomas On Tue, Aug 1, 2017 at

Kafka producer does not propagate error when unable to connect to broker

2017-07-16 Thread Thomas Weise
I noticed that when the Kafka output operator (0.9) is configured with a wrong port or missing the security setup required by the cluster, it will be in "ACTIVE" state but not do anything, essentially ignoring the error. I see that part of the problem is that the producer is async. That makes me

Apex HTTP Authentication

2017-07-12 Thread Thomas Weise
I'm working on a secure cluster that has authentication enabled for the YARN services. In my Apex setup, I have: apex.attr.STRAM_HTTP_AUTHENTICATION DISABLE "DISABLE - Disable authentication for web services." That's not what happens though, it rather follows the Hadoop setting

Re: How to consume from Kafka-V10 (SSL enabled) topic using Apex

2017-07-03 Thread Thomas Weise
You can use the operators that work with 0.9 Kafka client to access 0.10 Kafka cluster (Malhar release 3.7 has only operators that work with 0.9 client, next release will have 0.10 client support). Thomas On Mon, Jul 3, 2017 at 4:52 AM, Chaitanya Chebolu wrote: > Hi

Re: JDBC poll operator performance

2017-06-27 Thread Thomas Weise
he same order. Depends on the >> implementation and file format of the given db. >> >> ~ Bhupesh >> >> >> ___ >> >> Bhupesh Chawda >> >> E: bhup...@datatorrent.com | Twitter: @bhupeshsc >

JDBC poll operator performance

2017-06-26 Thread Thomas Weise
Hi, It seems the poll operator performs unnecessary operations in the case where the "key" column values in the source table are monotonic increasing. There should be no need to sort or do count selects. Instead it should be sufficient to just filter with the key range. Let's say the key column

Re: What is recommended way to achieve exactly once tuple processing in case of operator failure scenario

2017-06-17 Thread Thomas Weise
There is no way to avoid reprocessing tuples if the goal is to achieve exactly-once results. Please have a look at http://apex.apache.org/docs.html - presentation about fault tolerance and blog "End-to-end exactly-once". Platform alone cannot guarantee exactly-once results. Operators mutating

Re: Windowing

2017-06-09 Thread Thomas Weise
Here is an example that uses the window operator to compute top words and time series from Tweets (and optionally outputs the results to websocket): https://github.com/tweise/apex-samples/tree/master/twitter On Thu, Jun 8, 2017 at 11:11 AM, Tushar Gosavi wrote: > You

Re: trying to run Apex on Hadoop cluster

2017-06-06 Thread Thomas Weise
Claire, There shouldn't be a need to run the pipeline like this since the Apex runner already has the support to launch hadoop with the required dependencies. Can you please confirm that you are able to run the basic word count example as shown here:

Re: To run Apex runner of Apache Beam

2017-06-02 Thread Thomas Weise
It may also be helpful to enable logging to see if there are any exceptions in the execution. The console log only contains output from Maven. Is logging turned off (or a suitable slf4j backend missing)? On Fri, Jun 2, 2017 at 5:00 PM, Thomas Weise <t...@apache.org> wrote: >

Re: To run Apex runner of Apache Beam

2017-06-02 Thread Thomas Weise
Hi Claire, The stack traces suggest that the application was launched and the Apex DAG is deployed and might be running (it runs in embedded mode). Do you see any output? In case it writes to files, anything in output or output staging directories? Thanks, Thomas On Thu, Jun 1, 2017 at 3:18

One Year Anniversary of Apache Apex

2017-04-25 Thread Thomas Weise
Hi, It's been one year for Apache Apex as top level project, congratulations to the community! I wrote this blog to reflect and look ahead: http://www.atrato.io/blog/2017/04/25/one-year-apex/ Your comments and suggestions are welcome. Thanks, Thomas

Re: set attribute of type Map in properties.xml

2017-04-14 Thread Thomas Weise
Ram, Is there are reason why this is not reflected here: http://apex.apache.org/docs/apex/application_packages/ Thanks, Thomas On Wed, Apr 12, 2017 at 5:35 PM, Munagala Ramanath wrote: > http://docs.datatorrent.com/application_packages/ > > Please take a look at the

[ANNOUNCE] Apache Apex Malhar 3.7.0 released

2017-04-05 Thread Thomas Weise
Dear Community, The Apache Apex community is pleased to announce release 3.7.0 of the Apex Malhar library. Apache Apex is an enterprise grade big data-in-motion platform that unifies stream and batch processing. Apex was built for scalability and low-latency processing, high availability and

Re: 3.5.0 apex core build failing with hadoop 2.7.3 dependency

2017-03-23 Thread Thomas Weise
Bummer. Thanks Ram, I was just going to look at it. The binaries work on 2.7.x though. Will the issue only show in secure mode? Please create a JIRA in APEXCORE. Chinmay, you may need to apply a patch as part of your build to be able to use the 3.5.0 sources. Thomas On Thu, Mar 23, 2017 at

Re: Apex installation instructions

2017-03-22 Thread Thomas Weise
Hi, Those instructions are indeed for running in a dev environment only. There are a few other download options: http://apex.apache.org/downloads.html The Bigtop binaries can be used, but they are designed to also install Hadoop. I believe you are looking to just install the Apex CLI on an

Re: Need suggestion about temporary file usage on container node

2017-03-18 Thread Thomas Weise
wrote: > Thanks Thomas. Could you suggest a way so that I can figure out the path > to particular localized file on container node preferably using > OperatorContext ? > > > On Fri, Mar 17, 2017 at 8:06 PM, Thomas Weise <t...@apache.org> wrote: > >> If you

Re: JavaDocs for Apex Core now available

2017-03-17 Thread Thomas Weise
Yay! Will update the web site soon. -- sent from mobile https://ci.apache.org/projects/apex-core/apex-core- javadoc-release-3.4/index.html https://ci.apache.org/projects/apex-core/apex-core- javadoc-release-3.5/index.html ... at the above links. Ram --

Re: Need suggestion about temporary file usage on container node

2017-03-17 Thread Thomas Weise
If you use LIB_JARS or FILES, then those files will be localized by YARN on the container node, you don't need to manually copy them from HDFS or write cleanup code for it. Thomas On Fri, Mar 17, 2017 at 12:49 AM, vikram patil wrote: > Hello All, > > I am working on

Re: One-time Initialization of in-memory data using a data file

2017-01-23 Thread Thomas Weise
First link without frame: https://ci.apache.org/projects/apex-malhar/apex-malhar-javadoc-release-3.6/org/apache/apex/malhar/lib/state/managed/AbstractManagedStateImpl.html On Mon, Jan 23, 2017 at 3:33 PM, Thomas Weise <t...@apache.org> wrote: > Roger, > > An Apex operator typica

Re: One-time Initialization of in-memory data using a data file

2017-01-23 Thread Thomas Weise
Roger, An Apex operator typically holds state that it uses for processing and often that state is mutable. For large state: "Managed state" in Malhar (and its predecessor HDHT) were designed for large state that can be mutated efficiently under a specific write pattern (semi ordered keys).

[ANNOUNCE] Apache Apex Core 3.5.0 released

2016-12-19 Thread Thomas Weise
Dear Community, The Apache Apex community is pleased to announce release 3.5.0 of Apex Core (the engine). Apache Apex is an enterprise grade big data-in-motion platform that unifies stream and batch processing. Apex was built for scalability and low-latency processing, high availability and

[ANNOUNCE] Apache Apex Malhar 3.6.0 released

2016-12-08 Thread Thomas Weise
Dear Community, The Apache Apex community is pleased to announce release 3.6.0 of the Malhar library. The release resolved 70 JIRAs . The release adds first iteration of SQL support via Apache Calcite. Features include SELECT, INSERT, INNER JOIN with non-empty equi

Re: [EXTERNAL] Re: KafkaSinglePortInputOperator

2016-12-08 Thread Thomas Weise
If this was fixed, why is the JIRA still open? On Wed, Dec 7, 2016 at 10:58 PM, Chaitanya Chebolu < chaita...@datatorrent.com> wrote: > There was a bug in Malhar-3.4.0 and is fixed in Malhar-3.6.0. > JIRA details of this issue is here > . >

Re: Connection refused exception

2016-11-29 Thread Thomas Weise
There was a bug in the 3.5 release, this could be it. We fixed it and put it into 3.5.1-snapshot. The detail are in JIRA, I don't have access to it right now. -- sent from mobile On Nov 29, 2016 6:02 AM, "Feldkamp, Brandon (CONT)" < brandon.feldk...@capitalone.com> wrote: > This pops up a little

Re: Example using Streaming SQL ?

2016-11-18 Thread Thomas Weise
Hi, This is a new feature and the first cut will go into release 3.6. There are examples here: https://github.com/apache/apex-malhar/tree/master/demos/sql Documentation isn't in place yet. Chinmay, do you have some more information to share? Thanks, Thomas On Fri, Nov 18, 2016 at 5:58 PM,

Re: [EXTERNAL] kafka

2016-11-01 Thread Thomas Weise
IMO this should be added here: http://apex.apache.org/docs/malhar/operators/kafkaInputOperator/ On Tue, Nov 1, 2016 at 5:24 PM, hsy...@gmail.com wrote: > Hey Raja, > > The setup for secure kafka input operator is not easy. You can follow > these steps. > 1. Follow kafka

Re: Datatorrent operator for Hbase

2016-10-20 Thread Thomas Weise
This may also help: http://docs.datatorrent.com/troubleshooting/#hadoop-dependencies-conflicts On Thu, Oct 20, 2016 at 11:39 AM, Thomas Weise <t...@apache.org> wrote: > Please see the HBase dependency and its exclusions here: > > https://github.com/apache/apex-malhar/blob

Re: Datatorrent operator for Hbase

2016-10-20 Thread Thomas Weise
gt; } >> {code} >> >> - Tushar. >> >> >> On Thu, Oct 20, 2016 at 11:59 AM, Jaspal Singh >> <jaspal.singh1...@gmail.com> wrote: >> > Hi Thomas, Thanks for sharing this example code. >> > Still I couldn't see where the hbase tablename is configur

Re: balanced of Stream Codec

2016-10-15 Thread Thomas Weise
Without knowing the operations following the indeterministic partitioning, assume that you cannot have exactly-once results because processing won't be idempotent. If there are only stateless operations, then it should be OK. If there are stateful operations (windowing with any form of aggregation

Re: DT Fault Tolerance UHG

2016-10-13 Thread Thomas Weise
If the tuples processed by the output operator are not of type String, then the recovery code may fail because it attempts to interpret the messages that were already stored as String. That's a bug in the operator. The workaround is to convert the object to String in the upstream operator and then

Re: KafkaSinglePortExactlyOnceOutputOperator

2016-10-12 Thread Thomas Weise
Yes, it may be better to pull out the topic selection into an upstream operator that emits the messages on separate ports per topic and then you can use the exactly-once output operator without customization. On Wed, Oct 12, 2016 at 11:29 AM, Bandaru, Srinivas < srinivas.band...@optum.com>

Re: DataTorrent application Parllel processing

2016-10-11 Thread Thomas Weise
There is some draft documentation for 0.9 here: https://github.com/chaithu14/incubator-apex-malhar/blob/APEXMALHAR-2242-kafka0.9doc/docs/operators/kafkaInputOperator.md#09-version-of-kafka-input-operator On Tue, Oct 11, 2016 at 3:21 PM, Thomas Weise <thomas.we...@gmail.com> wrote:

Re: DataTorrent application Parllel processing

2016-10-11 Thread Thomas Weise
Hi, There is general information on partitioning here: http://apex.apache.org/docs/apex/application_development/#partitioning and a more recent presentation here: http://www.slideshare.net/ApacheApex/smart-partitioning-with-apache-apex-webinar For MapR streams you are using the 0.9 operator,

Re: Datatorrent fault tolerance

2016-10-06 Thread Thomas Weise
; > One more thing, so you mentioned about checkpointing the offset ranges > to replay in same order from kafka. > > Is there any property we need to configure to do that? like initialOffset > set to APPLICATION_OR_LATEST. > > > Thanks > Jaspal > > > On Th

Re: Datatorrent fault tolerance

2016-10-06 Thread Thomas Weise
output. > > What if any previous operators fail ? How we can make sure they also > recover using EXACTLY_ONCE processing mode ? > > > Thanks > Jaspal > > > On Thursday, October 6, 2016, Thomas Weise <thomas.we...@gmail.com> wrote: > >> In that case plea

Re: Datatorrent fault tolerance

2016-10-06 Thread Thomas Weise
Hi, It would be necessary to know a bit more about your application for specific recommendations, but from what I see above, a few things don't look right. It appears that you are setting the processing mode on the input operator, which only reads from Kafka. Exactly-once is important for

Re: Apex and Malhar Java 8 Certified

2016-10-03 Thread Thomas Weise
Apex is built against Java 7 and expected to work as is on Java 8 (Hadoop distros still support 1.7 as well). Are you running into specific issues? Thanks, Thomas On Mon, Oct 3, 2016 at 12:06 PM, McCullough, Alex < alex.mccullo...@capitalone.com> wrote: > Hey All, > > > > I know there were

Re: datatorrent gateway application package problem

2016-09-29 Thread Thomas Weise
Glad you have it working now. And yes, the gateway (like the apex CLI) only depends on the hadoop script. Different distributions of Hadoop have different ways of managing their bits and we want to be agnostic. So if you would like to use a different conf dir, then you would make that change

Re: datatorrent gateway application package problem

2016-09-26 Thread Thomas Weise
Hi, Can you try to launch the application using apex CLI instead of the UI? That might help to determine if it is a problem with the Hadoop install or the gateway: http://apex.apache.org/docs/apex/apex_cli/ Thanks, Thomas On Mon, Sep 26, 2016 at 5:04 PM, David Yan

Re: Can Apex launch command be submitted without using Apex CLI interactively?

2016-09-13 Thread Thomas Weise
Hi, The apex CLI an also be used in batch mode -e you can also use a script file like so: apex < mycmds.txt Other apex CLI documentation is here: http://apex.apache.org/docs/apex/apex_cli/ We will add above to it. HTH, Thomas On Tue, Sep 13, 2016 at 1:02 PM, Meemee Bradley

Re: Optional output ports.

2016-09-13 Thread Thomas Weise
> > > > Thanks, > > Alex > > > > *From: *Thomas Weise <thomas.we...@gmail.com> > *Reply-To: *"users@apex.apache.org" <users@apex.apache.org> > *Date: *Tuesday, September 13, 2016 at 11:58 AM > > *To: *"users@apex.apache.org" <user

Re: Optional output ports.

2016-09-13 Thread Thomas Weise
orts is true, but there is a requirement > implemented by the validator that negates this unless you explicitly set it > to on all your output ports (to the default no less). > > > > > > Thanks, > > Alex > > > > *From: *Thomas Weise <t...@apache.org> > *Rep

Re: Optional output ports.

2016-09-13 Thread Thomas Weise
That's right, if there are multiple output ports, the validation demands that at least one is connected. I actually think this validation is incorrect. It should be up to the application developer to decide how the output of an operator is consumed. It is similar to the return value of a

Re: HDHT operator

2016-09-12 Thread Thomas Weise
t; > On Mon, Sep 12, 2016 at 5:43 PM, Thomas Weise <thomas.we...@gmail.com> > wrote: > >> Here is an example for the equivalent functionality in Malhar: >> >> https://github.com/apache/apex-malhar/blob/master/benchmark/ >> src/main/java/com/datatorrent/benchmark/stat

[ANNOUNCE] Apache Apex Malhar 3.5.0 released

2016-09-05 Thread Thomas Weise
Dear Community, The Apache Apex community is pleased to announce release 3.5.0 of the Malhar library. The release resolved 63 JIRAs and comes with exciting new features and enhancements, including: - High level Java stream API now supports stateful transformations with Apache Beam style

Re: Apex Application failing to run.

2016-09-04 Thread Thomas Weise
Since it is the virtual memory, please see: http://docs.datatorrent.com/configuration/#hadoop-tuning -- sent from mobile On Sep 4, 2016 12:10 PM, "Munagala Ramanath" wrote: > It looks like there is not enough memory for all the containers. If you're > running in the

Re: Support for dynamic topology

2016-08-30 Thread Thomas Weise
Hi, This depends on the operator Z. If it has multiple input ports and those are optional, you can add P, then connect P to Z (and X), then remove Y. If Z has a single port, then Z and everything downstream would need to be removed or else the change won't result in a valid DAG and won't be

Re: kryo Serealization Exception

2016-08-22 Thread Thomas Weise
TCli.java:2050) > at com.datatorrent.stram.cli.DTCli.launchAppPackage(DTCli.java:3456) at > com.datatorrent.stram.cli.DTCli.access$7100(DTCli.java:106) at > com.datatorrent.stram.cli.DTCli$LaunchCommand.execute(DTCli.java:1895) at > com.datatorrent.stram.cli.DTCli$3.run(DTCli.java:1449) Ca

Re: kryo Serealization Exception

2016-08-22 Thread Thomas Weise
There is some information available here: http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception If the object is Java serializable, you can set the stream codec or wrap into KryoJdkContainer:

Re: Local CLI Logging

2016-08-19 Thread Thomas Weise
Hi Alex, The CLI is automatically changing the threshold for the log4j console appender. You can supply - to the CLI to enable output at debug level. You can also supply your own log4j.properties with customized loggers through the environment (the example below also includes how you enable

Re: Round Robin Partitioning with Dynamic Partitioning

2016-08-12 Thread Thomas Weise
If idempotency is needed (replay on recovery) and the order of tuples in the stream can change within a window (multiple upstream partitions), then it may be better to use a stateless hash function that ensures even distribution. On Fri, Aug 12, 2016 at 10:10 AM, Munagala Ramanath

Re: Kafka Input Operator questions

2016-06-29 Thread Thomas Weise
Hi Eric, I would recommend to use the latest version of Apex Core (3.4.0) as well as the latest version of Malhar (3.4.0 as well). My responses inline based on above versions: On Wed, Jun 29, 2016 at 5:54 AM, Martin, Eric wrote: > Hi, > > > > I have a few quick

Re: Kafka input operator

2016-06-19 Thread Thomas Weise
kafka.message.Message is the problem, MutablePair has a no-arg constructor and should be serializable for Kryo, On Sun, Jun 19, 2016 at 3:10 PM, wrote: > The Pairs in Apache common are not Kryo serializable. You can use other > pair data structure. For example KeyValuePair in

Re: A few questions about the Kafka 0.9 operator

2016-06-14 Thread Thomas Weise
You can also have a look at this blog and linked example that specifically covers exactly-once with input from Kafka: https://www.datatorrent.com/blog/end-to-end-exactly-once-with-apache-apex/ On Tue, Jun 14, 2016 at 2:47 PM, Thomas Weise <thomas.we...@gmail.com> wrote: > See respo

Re: A few questions about the Kafka 0.9 operator

2016-06-14 Thread Thomas Weise
See response below: On Tue, Jun 14, 2016 at 12:41 PM, Ananth Gundabattula < agundabatt...@gmail.com> wrote: > Hello Siyuan/All, > > I have a couple of questions regarding the Kafka 0.9 operator. Could you > please help me in understanding this operator a bit better? > > >- As stated in >

Re: kafka input is processing records in a jumbled order

2016-06-07 Thread Thomas Weise
Raja, Are you expecting ordering across multiple Kafka partitions? All messages from a given Kafka partition are received by the same consumer and thus will be ordered. However, when messages come from multiple partitions there is no such guarantee. Thomas On Tue, Jun 7, 2016 at 3:34 PM,

Re: kafka offset commit

2016-06-06 Thread Thomas Weise
Hi Raja, Which Kafka version are you using? With the new 0.9 connector there is no need for the offset manager: https://github.com/apache/apex-malhar/tree/master/kafka/src/main/java/org/apache/apex/malhar/kafka Thanks, Thomas On Mon, Jun 6, 2016 at 3:06 PM, Raja.Aravapalli

Re: avrò deserialization fails when using kafka

2016-06-06 Thread Thomas Weise
Since you are creating the decoder in setup(), please mark the property transient. No need to checkpoint it. Thanks, Thomas On Mon, Jun 6, 2016 at 10:06 AM, Munagala Ramanath wrote: > >