Re: Kryo version and default serializer

2018-06-20 Thread Vlad Rozov
Hi Aaron, Nice to see more contributors to the Apex! Welcome. I'd suggest that we stop this e-mail thread and continue discussion on the other dev@apex only e-mail thread. Usually users@apex is not a good choice to discuss dependency upgrades. Please be patient, majority of Apex and Apache

Re: High CPU % even after increasing VCORES

2018-05-01 Thread Vlad Rozov
I am not sure I understand your assumptions regarding VCORES and CPU usage and/or number of tuples emitted per second. Operators and ports in Apex have high thread affinity, so with a single operator deployed into a container, it does not matter how many VCORES that container has. Thank you,

Re: TimeBasedDedupOperator failing with NullPointer

2018-03-23 Thread Vlad Rozov
IMO, it is a regression introduced in 3.8.0, not a configuration/infrastructure or a NULL key issue. File a JIRA. Thank you, Vlad On 3/23/18 10:07, Vivek Bhide wrote: Hi Vlad, Yes, its 3.8.0 Regards Vivek -- Sent from: http://apache-apex-users-list.78494.x6.nabble.com/

Re: How to specify more than one fields as a dedup keyExpression

2017-10-23 Thread Vlad Rozov
I don't think that Apex expression evaluator is that smart :). Try "{$}.getId() + {$}.getId1()" or provide a getter that returns pair object. Thank you, Vlad On 10/23/17 18:13, Munagala Ramanath wrote: It needs to be an expression that combines both (or all) values: try "id + id1" Ram On

Re: Apache Apex with Apache slider

2017-08-04 Thread Vlad Rozov
What will be a reason to use Apex cli over Launcher API? Thank you, Vlad On 8/1/17 11:59, Thomas Weise wrote: Slider let's you package the binaries (along with the .apa), so that would eliminate the need for any extra install. On Tue, Aug 1, 2017 at 11:54 AM, Pramod Immaneni

Re: Processing speed units

2017-06-16 Thread Vlad Rozov
Questions related to a vendor specific tool do not belong to the list. Please check directly with the vendor. Thank you, Vlad On 6/15/17 22:53, Anushree Gupta wrote: Hi All On Data Torrent console the processing speed of an application is given in a format like 'x processed/s' or 'y

Re: Is there a way to schedule an operator?

2017-06-15 Thread Vlad Rozov
I agree. IMO, application scheduling is not part of a streaming engine functionality and there are plenty of other projects that can help with it. A streaming engine whether in batch or streaming use case needs to support - watermarks and triggers (with few exceptions mostly supported by

Re: Is there a way to schedule an operator?

2017-06-14 Thread Vlad Rozov
Use Apex client to start application or use REST API to start application if you have DataTorrent RTS license from you cron job. Thank you, Vlad On 6/13/17 15:30, Guilherme Hott wrote: I was thinking about to use a cron sending a kafka message and in my DAG I'll have a kafka input operator

Re: Hdfs + apex-core

2017-06-06 Thread Vlad Rozov
Do you have partition.assignment.strategy set anywhere in the Kafka operator properties? Thank you, Vlad On 6/5/17 10:14, userguy wrote: Did any one got chance to look at the code .. or do we have template to read from multiple topics in kafka ??? -- View this message in context:

Re: Container restart after operator failure

2017-06-06 Thread Vlad Rozov
It is a portion of the application master log. In the same log, check what container fails first and check that container for errors and exceptions. apex/malhar developers, any volunteer to see why the operator fails during recovery? Thank you, Vlad On 6/5/17 23:17, Priyanka Gugale wrote:

Re: To run Apex runner of Apache Beam

2017-06-05 Thread Vlad Rozov
Thanks for replying! Here attached my log and stack trace On Thursday, June 1, 2017 1:31 PM, Vlad Rozov <v.ro...@datatorrent.com <mailto:v.ro...@datatorrent.com>> wrote: Hi Claire, Can you provide maven logs? It may also help to obtain stack

Re: To run Apex runner of Apache Beam

2017-06-01 Thread Vlad Rozov
Hi Claire, Can you provide maven logs? It may also help to obtain stack trace. In a separate terminal window, run "jps -lv" and look for the jvm process id that executes org.apache.beam.examples.TfIdf. Use jstack to get the stack trace. Thank you, Vlad On 6/1/17 13:00, Claire Yuan wrote:

Re: Container failure without relaunch

2017-05-31 Thread Vlad Rozov
It may also help to enable DEBUG level logging for com.datatorrent.* once the issue is reproduced again and check activity in the application master logs. Thank you, Vlad On 5/30/17 10:41, Sandesh Hegde wrote: When that issue happens, please check the free resource(CPU and memory) available

Re: Exceeded allowed memory block for AvroFileInputOperator

2017-05-30 Thread Vlad Rozov
The warning means that the downstream operator(s) is not capable to keep up with the upstream operator and tuples are accumulated in the upstream buffer server leading to the increased latency. If the latency is within SLA, you may ignore the warning, otherwise try to partition downstream or

Re: Error while using AvroToPojo operator

2017-05-10 Thread Vlad Rozov
Do not use janino commons-compiler-jdk, it has a bug that prevents it from properly working with PojoUtils. Replace it with janino commons-compiler. Thank you, Vlad On 5/10/17 13:53, bhidevivek wrote: I am trying to read the hdfs audit logs from avro format and trying to convert it to POJO

Re: Apex + DT RTS

2017-05-10 Thread Vlad Rozov
or BigTop? Thanks On 2017-05-10 15:54 (-0700), Vlad Rozov <v.ro...@datatorrent.com> wrote: It is a source release download, not a binary download. You will need to build binaries. Also, documentation assumes an installation from a binary download. Thank you, Vlad On 5/10/17 15:46, apex user wro

Re: Apex + DT RTS

2017-05-10 Thread Vlad Rozov
- 3.6.0(apex-3.6.0-source-release.zip) Thanks On 2017-05-10 15:44 (-0700), Vlad Rozov <v.ro...@datatorrent.com> wrote: Which download do you use? Thank you, Vlad On 5/10/17 14:43, apex user wrote: Hi All, Can I work with Apache Apex independently without installing DataTorrent RTS bi

Re: Apex + DT RTS

2017-05-10 Thread Vlad Rozov
Which download do you use? Thank you, Vlad On 5/10/17 14:43, apex user wrote: Hi All, Can I work with Apache Apex independently without installing DataTorrent RTS binaries? I see from the Apache Apex Documentation, I can access Apex CLI from /bin/apex. But, when I download it from Apache

Re: Window id got stuck

2017-05-09 Thread Vlad Rozov
in a separate containers. There is no functional processing happening in the blocked operators. There is no downstream operators. Please suggest. On Mon, May 8, 2017 at 8:58 PM, Vlad Rozov <v.ro...@datatorrent.com <mailto:v.ro...@datatorrent.com>> wrote: The exception

Re: Window id got stuck

2017-05-08 Thread Vlad Rozov
The exception below means that a downstream operator abruptly disconnected. It is not an indication of a problem by itself. Please check downstream operator container log for exceptions and error messages. Thank you, Vlad On 5/8/17 07:12, Pramod Immaneni wrote: Hi Chiranjeevi, I am

Re: One Year Anniversary of Apache Apex

2017-04-25 Thread Vlad Rozov
Congratulations to the community! Thank you, Vlad On 4/25/17 08:14, Thomas Weise wrote: Hi, It's been one year for Apache Apex as top level project, congratulations to the community! I wrote this blog to reflect and look ahead: http://www.atrato.io/blog/2017/04/25/one-year-apex/ Your

Re: DB connection getting lost

2017-04-18 Thread Vlad Rozov
Do you call JdbcStore from an operator thread or do you share it across multiple threads? In the later case the behavior is undefined. Thank you, Vlad On 4/17/17 02:10, Priyanka Gugale wrote: Can you make sure you have called connect first before createStatement. Also if possible please

Re: Restricting emit speed of AbstractFileInputOperator.

2017-04-10 Thread Vlad Rozov
If it takes t to emit one tuple and t*x < 0.5 sec, emitBatchSize can be set to x. This will help with CPU utilization as platform will put the operator thread to sleep if it does not emit and does not have handleIdleTime() callback defined. Thank you, Vlad On 4/7/17 18:53, Bhupesh Chawda

Re: Blocked Operator

2017-03-07 Thread Vlad Rozov
Please check stack trace of the thread where an operator runs. The main thread in a container is responsible for heartbeat monitoring and what you see in the stack trace is a wait between heartbeats. Thank you, Vlad /Join us at Apex Big Data World-San Jose

Re: Occasional Out of order tuples when emitting from a thread

2017-02-21 Thread Vlad Rozov
Please see Threads section at http://apex.apache.org/docs/apex/development_best_practices/ Note that an operator has only one thread created by the Apex that it can use to interact with the system. Starting with 3.5.0 release, this is enforced by the platform and it will throw an exception if

Re: Debugging operator latency.

2017-02-21 Thread Vlad Rozov
The name of the property is dt.attr. CONTAINER_JVM_OPTIONS. You don't need to replace with the actual path as it is not known till an application and a containers are allocated by Yarn (Apex will replace with the container log directory). You may also set the attribute at the launch time:

Re: Debugging operator latency.

2017-02-17 Thread Vlad Rozov
I would start with taking few stack traces (the latest Datatorrent RTS allows to take a stack trace otherwise use jstack) of the container where the operator is deployed and trying to identify what operator's thread is doing. If there is nothing obvious there, proceed as Ashwin suggested.

Re: Logging

2017-01-08 Thread Vlad Rozov
One option is to use SocketAppender, but I will not recommend using real time logging in one place for production - it is not scalable. At some point logging to a single place will lead to application performance degradation. Thank you, Vlad On 1/5/17 06:11, Doyle, Austin O. wrote: What’s

Re: Data duplication between operators

2016-12-19 Thread Vlad Rozov
This will be a bug unless the downstream operator constantly fails and is restored to a checkpoint in which case it is expected that it may get the same tuple multiple times. Thank you, Vlad On 12/19/16 11:33, Doyle, Austin O. wrote: I’m trying to send some sequential data between an S3

Re: Exception with API

2016-12-12 Thread Vlad Rozov
Hi Brandon, Check the .apa package validity. Try to manually copy the .apa file to the node where gateway is installed, start apex using - option and run "get-app-package-info ". Thank you, Vlad On 12/12/16 07:20, Feldkamp, Brandon (CONT) wrote: Hello, I’m getting the below

Re: Query

2016-12-06 Thread Vlad Rozov
I'd recommend to use additional output port solution outlined by Bhupesh. There are few Apex applications on the field that leverage that solution. Thank you, Vlad On 12/4/16 11:45, Vishal Agrawal wrote: Thank you Bhupesh and Ram. Appreciate your quick response. I see

Re: Packaging Operator Jars

2016-11-30 Thread Vlad Rozov
It is necessary to exclude hadoop jars from the application package. Please see http://docs.datatorrent.com/troubleshooting/#hadoop-dependencies-conflicts Thank you, Vlad On 11/30/16 12:16, Doyle, Austin O. wrote: I was wondering if there was a way to package operator jars with their

Re: Connection refused exception

2016-11-30 Thread Vlad Rozov
access$500(StreamingContainer.java:130) at com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1438) *From: *Vlad Rozov <v.ro...@datatorrent.com> *Organization: *DataTorrent *Reply-To: *"users@apex.apache.org" <users@apex.apache.org> *Date:

Re: Connection refused exception

2016-11-29 Thread Vlad Rozov
Brandon, Is the stacktrace from the app master log or is it from a local mode run? The exception is raised due to the buffer server termination in one of the operators containers and I guess that it happens when app master tries to purge data from the buffer server. Check the operators

Re: error launching in docker sandbox

2016-11-29 Thread Vlad Rozov
Hi Brandon, No, it should not. Start apex with - option that will provide more details in the log. Thank you, Vlad On 11/29/16 14:07, Feldkamp, Brandon (CONT) wrote: Hello! I’m trying to launch my app in the docker sandbox but I’m getting an error. I’m able to launch the app

Re: balanced of Stream Codec

2016-10-17 Thread Vlad Rozov
Using different hash function will help only in case data is equally distributed across categories. In many cases data is skewed and some categories occur more frequently than others. In such case generic hash function will not help. Can you try to sample data and see if the data is equally

Re: datatorrent gateway application package problem

2016-09-28 Thread Vlad Rozov
Check namenode logs. Are there any exceptions/errors raised at the same time as the gateway exception? On 9/28/16 14:43, d...@softiron.co.uk wrote: On 2016-09-28 14:09, David Yan wrote: dtgateway does not use the native libraries. I just tried a clean install of RTS 3.5.0 community edition

Re: Apex Application Error

2016-09-05 Thread Vlad Rozov
Please check your application package (.apa) and application pom.xml for hadoop/yarn libraries. All hadoop dependencies must be excluded as they are provided at run-time. Vlad On 9/4/16 16:14, JOHN, BIBIN wrote: Could you please help me to identify the root cause for below failure?

Re: DT Exception while streaming byte array to object array

2016-08-18 Thread Vlad Rozov
These errors mean that a downstream operator exited without closing/disconnecting socket to the buffer server. Please check application master log to identify container/operator that fails first. Vlad On 8/18/16 10:57, Mukkamula, Suryavamshivardhan (CWM-NR) wrote: Hi Team, Can you please

Re: Information Needed

2016-08-02 Thread Vlad Rozov
Both setup() and beginWindow() should work. It will be more correct to open the configuration stream and parse the configuration file in setup() as you tried in the initial implementation as long as configuration path does not depend on window Id. Where the inputConfStream is used? Most likely