Re: Catastrophic Error: Execution halted due to Kryo exception! Encountered unregistered class ID 12

2018-08-24 Thread Vlad Rozov
Or apex-core/engine/src/main/java/com/datatorrent/stram/plan/logical/DefaultKryoStreamCodec.java that is part of the core and has https://issues.apache.org/jira/browse/APEXCORE-502 fixed. Thank you, Vlad > On Aug 22, 2018, at 22:20, T

Re: Buffer server overflow

2018-06-20 Thread Vlad Rozov
When spilling to disk is enabled, an upstream operator will be blocked from emitting more tuples to a corresponding output port when the size of a buffer (in bytes) exceeds a limit (see documentation on how to configure the limit). This is a back pressure mechanism that Pramod refers to. There

Re: Kryo version and default serializer

2018-06-20 Thread Vlad Rozov
Hi Aaron, Nice to see more contributors to the Apex! Welcome. I'd suggest that we stop this e-mail thread and continue discussion on the other dev@apex only e-mail thread. Usually users@apex is not a good choice to discuss dependency upgrades. Please be patient, majority of Apex and Apache c

Re: High CPU % even after increasing VCORES

2018-05-01 Thread Vlad Rozov
I am not sure I understand your assumptions regarding VCORES and CPU usage and/or number of tuples emitted per second. Operators and ports in Apex have high thread affinity, so with a single operator deployed into a container, it does not matter how many VCORES that container has. Thank you,

Re: TimeBasedDedupOperator failing with NullPointer

2018-03-23 Thread Vlad Rozov
IMO, it is a regression introduced in 3.8.0, not a configuration/infrastructure or a NULL key issue. File a JIRA. Thank you, Vlad On 3/23/18 10:07, Vivek Bhide wrote: Hi Vlad, Yes, its 3.8.0 Regards Vivek -- Sent from: http://apache-apex-users-list.78494.x6.nabble.com/

Re: TimeBasedDedupOperator failing with NullPointer

2018-03-23 Thread Vlad Rozov
Most likely the NPE is caused by a bug (regression introduced by APEXMALHAR-2492 Correct usage of empty Slice in Malhar Library). What is operator library version (3.8.0)? Thank you, Vlad On 3/23/18 05:48, Bhupesh Chawda wrote: Hi Vivek, Can you send across few more details, like the config

Re: How to specify more than one fields as a dedup keyExpression

2017-10-26 Thread Vlad Rozov
It will be better to return an instance of a class that implements hash() and equals() as a key instead of a String. Even for a key that has two integers benefit over million tuples may be significant. Looking at the source code, it looks that support for {$} notation was removed. I did not te

Re: How to specify more than one fields as a dedup keyExpression

2017-10-23 Thread Vlad Rozov
I don't think that Apex expression evaluator is that smart :). Try "{$}.getId() + {$}.getId1()" or provide a getter that returns pair object. Thank you, Vlad On 10/23/17 18:13, Munagala Ramanath wrote: It needs to be an expression that combines both (or all) values: try "id + id1" Ram On

Re: Apache Apex with Apache slider

2017-08-04 Thread Vlad Rozov
What will be a reason to use Apex cli over Launcher API? Thank you, Vlad On 8/1/17 11:59, Thomas Weise wrote: Slider let's you package the binaries (along with the .apa), so that would eliminate the need for any extra install. On Tue, Aug 1, 2017 at 11:54 AM, Pramod Immaneni mailto:pra...@

Re: Processing speed units

2017-06-16 Thread Vlad Rozov
Questions related to a vendor specific tool do not belong to the list. Please check directly with the vendor. Thank you, Vlad On 6/15/17 22:53, Anushree Gupta wrote: Hi All On Data Torrent console the processing speed of an application is given in a format like 'x processed/s' or 'y emitted/

Re: Is there a way to schedule an operator?

2017-06-15 Thread Vlad Rozov
I agree. IMO, application scheduling is not part of a streaming engine functionality and there are plenty of other projects that can help with it. A streaming engine whether in batch or streaming use case needs to support - watermarks and triggers (with few exceptions mostly supported by Apex)

Re: Is there a way to schedule an operator?

2017-06-14 Thread Vlad Rozov
Use Apex client to start application or use REST API to start application if you have DataTorrent RTS license from you cron job. Thank you, Vlad On 6/13/17 15:30, Guilherme Hott wrote: I was thinking about to use a cron sending a kafka message and in my DAG I'll have a kafka input operator to

Re: Hdfs + apex-core

2017-06-06 Thread Vlad Rozov
Do you have partition.assignment.strategy set anywhere in the Kafka operator properties? Thank you, Vlad On 6/5/17 10:14, userguy wrote: Did any one got chance to look at the code .. or do we have template to read from multiple topics in kafka ??? -- View this message in context: http:/

Re: Container restart after operator failure

2017-06-06 Thread Vlad Rozov
It is a portion of the application master log. In the same log, check what container fails first and check that container for errors and exceptions. apex/malhar developers, any volunteer to see why the operator fails during recovery? Thank you, Vlad On 6/5/17 23:17, Priyanka Gugale wrote:

Re: To run Apex runner of Apache Beam

2017-06-05 Thread Vlad Rozov
log and stack trace On Thursday, June 1, 2017 1:31 PM, Vlad Rozov mailto:v.ro...@datatorrent.com>> wrote: Hi Claire, Can you provide maven logs? It may also help to obtain stack trace. In a separate terminal window, run "jps -lv" and look

Re: To run Apex runner of Apache Beam

2017-06-01 Thread Vlad Rozov
Hi Claire, Can you provide maven logs? It may also help to obtain stack trace. In a separate terminal window, run "jps -lv" and look for the jvm process id that executes org.apache.beam.examples.TfIdf. Use jstack to get the stack trace. Thank you, Vlad On 6/1/17 13:00, Claire Yuan wrote:

Re: Container failure without relaunch

2017-05-31 Thread Vlad Rozov
It may also help to enable DEBUG level logging for com.datatorrent.* once the issue is reproduced again and check activity in the application master logs. Thank you, Vlad On 5/30/17 10:41, Sandesh Hegde wrote: When that issue happens, please check the free resource(CPU and memory) available

Re: Exceeded allowed memory block for AvroFileInputOperator

2017-05-30 Thread Vlad Rozov
The warning means that the downstream operator(s) is not capable to keep up with the upstream operator and tuples are accumulated in the upstream buffer server leading to the increased latency. If the latency is within SLA, you may ignore the warning, otherwise try to partition downstream or de

Re: Window id got stuck

2017-05-11 Thread Vlad Rozov
ed running in an other container.. > > > What is the reason for window id getting stuck in 3.2 version. > > > >> On Wed, May 10, 2017 at 9:05 PM, Vlad Rozov wrote: >> Is HDFS write operator partitioned? If not, in 3.2 release Apex deploys >> unifier fo

Re: Error while using AvroToPojo operator

2017-05-10 Thread Vlad Rozov
Do not use janino commons-compiler-jdk, it has a bug that prevents it from properly working with PojoUtils. Replace it with janino commons-compiler. Thank you, Vlad On 5/10/17 13:53, bhidevivek wrote: I am trying to read the hdfs audit logs from avro format and trying to convert it to POJO u

Re: Apex + DT RTS

2017-05-10 Thread Vlad Rozov
BigTop? Thanks On 2017-05-10 15:54 (-0700), Vlad Rozov wrote: It is a source release download, not a binary download. You will need to build binaries. Also, documentation assumes an installation from a binary download. Thank you, Vlad On 5/10/17 15:46, apex user wrote: I downloaded from https

Re: Apex + DT RTS

2017-05-10 Thread Vlad Rozov
- 3.6.0(apex-3.6.0-source-release.zip) Thanks On 2017-05-10 15:44 (-0700), Vlad Rozov wrote: Which download do you use? Thank you, Vlad On 5/10/17 14:43, apex user wrote: Hi All, Can I work with Apache Apex independently without installing DataTorrent RTS binaries? I see from the Apache

Re: Apex + DT RTS

2017-05-10 Thread Vlad Rozov
Which download do you use? Thank you, Vlad On 5/10/17 14:43, apex user wrote: Hi All, Can I work with Apache Apex independently without installing DataTorrent RTS binaries? I see from the Apache Apex Documentation, I can access Apex CLI from /bin/apex. But, when I download it from Apache Ap

Re: Window id got stuck

2017-05-10 Thread Vlad Rozov
, chiranjeevi vasupilli wrote: The upstream operator processing fine, there are no exceptions and window id keep moving. Apex version: 3.2.2-incubating-SNAPSHOT Thanks Chiranjeevi V On Tue, May 9, 2017 at 10:12 PM, Vlad Rozov <mailto:v.ro...@datatorrent.com>> wrote: Sorry, I mean upstream

Re: Window id got stuck

2017-05-09 Thread Vlad Rozov
a separate containers. There is no functional processing happening in the blocked operators. There is no downstream operators. Please suggest. On Mon, May 8, 2017 at 8:58 PM, Vlad Rozov <mailto:v.ro...@datatorrent.com>> wrote: The exception below means that a downstream

Re: Window id got stuck

2017-05-08 Thread Vlad Rozov
The exception below means that a downstream operator abruptly disconnected. It is not an indication of a problem by itself. Please check downstream operator container log for exceptions and error messages. Thank you, Vlad On 5/8/17 07:12, Pramod Immaneni wrote: Hi Chiranjeevi, I am assuming

Re: How to write data in ORC format to hdfs instead of text format

2017-04-27 Thread Vlad Rozov
Hi Rishi, Problem is that it is not possible to use AbstractFileOutputOperator to write to a columnar storage data formats such as Parquet or ORC. AbstractFileOutputOperator assumes row data formats. AFAIK, Malhar does not have output operators that support columnar storage, so it will be nec

Re: One Year Anniversary of Apache Apex

2017-04-25 Thread Vlad Rozov
Congratulations to the community! Thank you, Vlad On 4/25/17 08:14, Thomas Weise wrote: Hi, It's been one year for Apache Apex as top level project, congratulations to the community! I wrote this blog to reflect and look ahead: http://www.atrato.io/blog/2017/04/25/one-year-apex/ Your comme

Re: Wait until setup completes prior to dag execution

2017-04-21 Thread Vlad Rozov
One possible way is to introduce a delay operator. I am not 100% that it will work, but it is worth trying. Introduce a dummy output port on the operator that takes too long to setup. Connect it to an input port of a delay operator. Connect delay operator output port to a dummy input port of th

Re: How to write data in ORC format to hdfs instead of text format

2017-04-21 Thread Vlad Rozov
Both ORC and Parquet are columnar storage formats and require Output operators to understand it (do tuple to ORC column translation). Thank you, Vlad On 4/20/17 22:46, Priyanka Gugale wrote: You can treat your ocr file as binary file. AbstractFileOutputOperator can be used for binary files, i

Re: DB connection getting lost

2017-04-18 Thread Vlad Rozov
Do you call JdbcStore from an operator thread or do you share it across multiple threads? In the later case the behavior is undefined. Thank you, Vlad On 4/17/17 02:10, Priyanka Gugale wrote: Can you make sure you have called connect first before createStatement. Also if possible please share

Re: Restricting emit speed of AbstractFileInputOperator.

2017-04-10 Thread Vlad Rozov
If it takes t to emit one tuple and t*x < 0.5 sec, emitBatchSize can be set to x. This will help with CPU utilization as platform will put the operator thread to sleep if it does not emit and does not have handleIdleTime() callback defined. Thank you, Vlad On 4/7/17 18:53, Bhupesh Chawda wro

Re: Rest call to Data Torrent package deploy failing

2017-03-30 Thread Vlad Rozov
Hi Surya, This mailing list covers only Apache Apex questions. For questions related to a specific vendor distribution or vendor products, please contact corresponding vendor. In this particular case, please raise Datatorrent support ticket or contact Datatorrent. Thank you, Vlad /Join us

Re: 3.5.0 apex core build failing with hadoop 2.7.3 dependency

2017-03-23 Thread Vlad Rozov
YarnConfiguration is @Evolving :) Moving discussion to dev@apex. Thank you, Vlad // On 3/23/17 08:43, Pramod Immaneni wrote: So much for semantic versioning.. On Thu, Mar 23, 2017 at 7:15 AM, Munagala Ramanath mailto:r...@datatorrent.com>> wrote: Looks like this was a breaking change

Re: Blocked Operator

2017-03-07 Thread Vlad Rozov
Yes, for older releases operators may be marked as stuck due to delays in setup, but for the provided stack trace, it is a wait between heartbeats, not a wait for setup and activation. Thank you, Vlad On 3/7/17 06:50, Sandesh Hegde wrote: In older versions of Apex, heart beat loop waits for o

Re: Blocked Operator

2017-03-07 Thread Vlad Rozov
Please check stack trace of the thread where an operator runs. The main thread in a container is responsible for heartbeat monitoring and what you see in the stack trace is a wait between heartbeats. Thank you, Vlad /Join us at Apex Big Data World-San Jose

Re: Occasional Out of order tuples when emitting from a thread

2017-02-21 Thread Vlad Rozov
Please see Threads section at http://apex.apache.org/docs/apex/development_best_practices/ Note that an operator has only one thread created by the Apex that it can use to interact with the system. Starting with 3.5.0 release, this is enforced by the platform and it will throw an exception if

Re: Debugging operator latency.

2017-02-21 Thread Vlad Rozov
The name of the property is dt.attr. CONTAINER_JVM_OPTIONS. You don't need to replace with the actual path as it is not known till an application and a containers are allocated by Yarn (Apex will replace with the container log directory). You may also set the attribute at the launch time: -D

Re: Debugging operator latency.

2017-02-17 Thread Vlad Rozov
I would start with taking few stack traces (the latest Datatorrent RTS allows to take a stack trace otherwise use jstack) of the container where the operator is deployed and trying to identify what operator's thread is doing. If there is nothing obvious there, proceed as Ashwin suggested. Tha

Re: Logging

2017-01-08 Thread Vlad Rozov
One option is to use SocketAppender, but I will not recommend using real time logging in one place for production - it is not scalable. At some point logging to a single place will lead to application performance degradation. Thank you, Vlad On 1/5/17 06:11, Doyle, Austin O. wrote: What’s

Re: Data duplication between operators

2016-12-19 Thread Vlad Rozov
This will be a bug unless the downstream operator constantly fails and is restored to a checkpoint in which case it is expected that it may get the same tuple multiple times. Thank you, Vlad On 12/19/16 11:33, Doyle, Austin O. wrote: I’m trying to send some sequential data between an S3 Inp

Re: Exception with API

2016-12-12 Thread Vlad Rozov
Hi Brandon, Check the .apa package validity. Try to manually copy the .apa file to the node where gateway is installed, start apex using - option and run "get-app-package-info ". Thank you, Vlad On 12/12/16 07:20, Feldkamp, Brandon (CONT) wrote: Hello, I’m getting the below exception

Re: Application level logs

2016-12-12 Thread Vlad Rozov
AFAIK, Apex delegates log management to Hadoop/Yarn and the application. It is possible to retain logs both with log aggregation enabled and disabled. If aggregation is enabled, Yarn will consolidate logs from all nodes once application is completed and delete them after yarn.log-aggregation.re

Re: Query

2016-12-06 Thread Vlad Rozov
I'd recommend to use additional output port solution outlined by Bhupesh. There are few Apex applications on the field that leverage that solution. Thank you, Vlad On 12/4/16 11:45, Vishal Agrawal wrote: Thank you Bhupesh and Ram. Appreciate your quick response. I see ThrottlingStatsListene

Re: Connection refused exception

2016-12-01 Thread Vlad Rozov
Nov 30, 2016 at 8:20 AM, Vlad Rozov <mailto:v.ro...@datatorrent.com>> wrote: Hmm, that is strange. I would expect an ERROR in the KafkaInputOperator around 21:01. Please check the earliest killed container log. 2016-11-24 21:01:01,839 [IPC Ser

Re: Packaging Operator Jars

2016-11-30 Thread Vlad Rozov
It is necessary to exclude hadoop jars from the application package. Please see http://docs.datatorrent.com/troubleshooting/#hadoop-dependencies-conflicts Thank you, Vlad On 11/30/16 12:16, Doyle, Austin O. wrote: I was wondering if there was a way to package operator jars with their depen

Re: Connection refused exception

2016-11-30 Thread Vlad Rozov
access$500(StreamingContainer.java:130) at com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1438) *From: *Vlad Rozov *Organization: *DataTorrent *Reply-To: *"users@apex.apache.org" *Date: *Tuesday, November 29, 2016 at 10:21 PM *To: *"

Re: Connection refused exception

2016-11-29 Thread Vlad Rozov
Brandon, Is the stacktrace from the app master log or is it from a local mode run? The exception is raised due to the buffer server termination in one of the operators containers and I guess that it happens when app master tries to purge data from the buffer server. Check the operators contai

Re: error launching in docker sandbox

2016-11-29 Thread Vlad Rozov
Hi Brandon, No, it should not. Start apex with - option that will provide more details in the log. Thank you, Vlad On 11/29/16 14:07, Feldkamp, Brandon (CONT) wrote: Hello! I’m trying to launch my app in the docker sandbox but I’m getting an error. I’m able to launch the app elsewher

Re: KryoException to write Hbase

2016-10-24 Thread Vlad Rozov
The proper way to initialize gson field is to use setup(): public class HbaseTableUpdate extends AbstractHBasePutOutputOperator { // Serializable not needed ... private transient Gson gson; ... @Overwrite public void setup(...) { gson = new Gson(); Vlad On 10/20/16 22:19, Tushar Go

Re: balanced of Stream Codec

2016-10-17 Thread Vlad Rozov
Using different hash function will help only in case data is equally distributed across categories. In many cases data is skewed and some categories occur more frequently than others. In such case generic hash function will not help. Can you try to sample data and see if the data is equally dis

Re: Apex and Malhar Java 8 Certified

2016-10-05 Thread Vlad Rozov
to use Java 8 features. Thanks! Brandon *From: *Vlad Rozov mailto:v.ro...@datatorrent.com>> *Organization: *DataTorrent *Reply-To: *"users@apex.apache.org <mailto:users@apex.apache.org&g

Re: Reading compressed file using FileSplitter

2016-10-03 Thread Vlad Rozov
Another option is https://github.com/dain/snappy. Thank you, Vlad On 10/3/16 04:15, Priyanka Gugale wrote: Looks like hadoop commons compressor has read only support for snappy. So if your usecase is just to decompress you can try it out. There is an example on this page: http://commons.apac

Re: Apex and Malhar Java 8 Certified

2016-10-03 Thread Vlad Rozov
We do test on Java 8 - both Apex Core and Malhar Apache Jenkins builds use JDK 1.8 to run tests. Thank you, Vlad On 10/3/16 15:45, Thomas Weise wrote: Apex is built against Java 7 and expected to work as is on Java 8 (Hadoop distros still support 1.7 as well). Are you running into specific i

Re: datatorrent gateway application package problem

2016-09-28 Thread Vlad Rozov
Check namenode logs. Are there any exceptions/errors raised at the same time as the gateway exception? On 9/28/16 14:43, d...@softiron.co.uk wrote: On 2016-09-28 14:09, David Yan wrote: dtgateway does not use the native libraries. I just tried a clean install of RTS 3.5.0 community edition on

Re: Data torrent application connecting to Mapr streams

2016-09-21 Thread Vlad Rozov
Please also check application package (.apa) for any hadoop dependency. They all must be excluded in the application pom as Apex will provide them at run-time. Thank you, Vlad On 9/21/16 08:27, Vlad Rozov wrote: Is gateway/apex installed on edge node of mapr cluster? Most likely this is

Re: Data torrent application connecting to Mapr streams

2016-09-21 Thread Vlad Rozov
Is gateway/apex installed on edge node of mapr cluster? Most likely this is compatibility issue between jars installed on gateway and the actual cluster. The root cause of the application failure is the following exception on Apex application master java.lang.NoSuchFieldError: IS_WINDOWS a

Re: Apex Application Error

2016-09-05 Thread Vlad Rozov
Please check your application package (.apa) and application pom.xml for hadoop/yarn libraries. All hadoop dependencies must be excluded as they are provided at run-time. Vlad On 9/4/16 16:14, JOHN, BIBIN wrote: Could you please help me to identify the root cause for below failure? Contain

Re: Change Logging Levels on Running App with Custom Log Properties

2016-09-02 Thread Vlad Rozov
It is possible to change the log level of loggers independently of how initial level was set. Do you see log level change INFO messages in the dt.log? Thank you, Vlad On 8/30/16 11:34, McCullough, Alex wrote: Hey Everyone, If I have created a custom log4j properties in my application, whic

Re: DT Exception while streaming byte array to object array

2016-08-18 Thread Vlad Rozov
These errors mean that a downstream operator exited without closing/disconnecting socket to the buffer server. Please check application master log to identify container/operator that fails first. Vlad On 8/18/16 10:57, Mukkamula, Suryavamshivardhan (CWM-NR) wrote: Hi Team, Can you please thro

can operators emit on a different from the operator itself thread?

2016-08-10 Thread Vlad Rozov
The short answer is no, creating worker thread to emit tuples is not supported by Apex and will lead to an undefined behavior. Operators in Apex have strong thread affinity and all interaction with the platform must happen on the operator thread. Vlad

Re: Input Operator Activate vs Setup

2016-08-05 Thread Vlad Rozov
For input operator there will be no much difference, but in case connection to IMDG is needed in other operators (for example to lookup extra info), it will be better to establish connection in setup() to avoid unnecessary dropping and re-establishing connection in activate/deactivate if upstre

Re: Information Needed

2016-08-02 Thread Vlad Rozov
Needed Hi, inputConfStream is used to parse the input line from the feed. This is used for all the lines from the feed. Not sure why the stream is getting closed? Regards, Surya Vamshi *From:*Vlad Rozov [mailto:v.ro...@datatorrent.com] *Sent:* 2016, August, 02 4:26 PM *To:* users@apex.apa

Re: Information Needed

2016-08-02 Thread Vlad Rozov
Both setup() and beginWindow() should work. It will be more correct to open the configuration stream and parse the configuration file in setup() as you tried in the initial implementation as long as configuration path does not depend on window Id. Where the inputConfStream is used? Most likely

Re: delegation token not found in cache

2016-07-24 Thread Vlad Rozov
Please open JIRA against Apex Core and attach relevant logs. Thank you, Vlad On 7/21/16 22:14, Raja.Aravapalli wrote: Hi, Can someone please help me why my DAG is failing with below exception after 8 days running succesfully!! Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.