Re: How to specify more than one fields as a dedup keyExpression

2017-10-26 Thread Munagala Ramanath
Yes, if you use "+" in your expression, then the numeric sum will be computed if the fields areintegers and the catenation if they are strings; the former will not yield the desired uniqueness.But now that I think about it some more, even the latter will not, here's why: If the fields in

Re: How to specify more than one fields as a dedup keyExpression

2017-10-25 Thread Munagala Ramanath
The mapping: tuple -> dedup-key needs to be 1-1; if multiple tuples are mapped to the same dedup key you'll see problems like this. In your case multiple tuples can be mapped to the same value of "id + id1". For example, all tuples with (id, id1) being any of these pairs will all map to a

Re: How to specify more than one fields as a dedup keyExpression

2017-10-23 Thread Munagala Ramanath
It needs to be an expression that combines both (or all) values: try "id + id1" Ram On Monday, October 23, 2017, 6:04:14 PM PDT, Vivek Bhide wrote: Thanks Ram for your suggestions Field types that I am trying are the basic primitive types. In fact, I was just

Re: How to specify more than one fields as a dedup keyExpression

2017-10-23 Thread Munagala Ramanath
Could you provide some details on what types your multiple keys are and what expression variants you tried and what the result was ? The base class of BoundedDedupOperator is AbstractDeduper; within that class you'll see an method getKey(); you should be able to override that to retrieve the

Re: problems viewing the Datatorrent application logs

2017-03-23 Thread Munagala Ramanath
If log file aggregation is enabled, aggregated logs for terminated logs should be available on HDFS and retrievable with: *yarn logs -applicationId *** Log aggregation is discussed further at: https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/ and is controlled

JavaDocs for Apex Core now available

2017-03-17 Thread Munagala Ramanath
https://ci.apache.org/projects/apex-core/apex-core-javadoc-release-3.4/index.html https://ci.apache.org/projects/apex-core/apex-core-javadoc-release-3.5/index.html ... at the above links. Ram -- ___ Munagala V. Ramanath Software Engineer

Re: Preferred way to write compressed files to HDFS

2017-03-13 Thread Munagala Ramanath
Take a look at the testCompression() method in AbstractFileOutputOperatorTest.java for an example. Ram On Mon, Mar 13, 2017 at 5:25 PM, Ganelin, Ilya wrote: > What is the recommended way to write compressed data to HDFS? Should I > extend AbstractFileOutputOperator

Re: Blocked operator PTOperator

2017-03-01 Thread Munagala Ramanath
g from the code above which > starts > > > > with 'Marking operator ' > > > > > > > > Regards, > > > > Ashwin. > > > > > > > > On Tue, Feb 28, 2017 at 12:03 PM, Sunil Parmar < > spar.

Re: Blocked operator PTOperator

2017-02-28 Thread Munagala Ramanath
It likely means that that operator is taking too long to return from one of the callbacks like beginWindow(), endWindow(), emitTuples(), etc. Do you have any potentially blocking calls to external systems in any of those callbacks ? Ram On Tue, Feb 28, 2017 at 11:09 AM, Sunil Parmar

Re: Occasional Out of order tuples when emitting from a thread

2017-02-21 Thread Munagala Ramanath
gt; public transient DefaultInputPort kafkaStreamInput = >> >> new DefaultInputPort() { >> >> List errors = new ArrayList(); >> >> @Override >> >> public void process(String consumerRecord) { >> >>

Re: Running shell commands which fork from within Operator exits with 143 exit code

2017-02-10 Thread Munagala Ramanath
Looks like you may be blocking the operator thread with the p.waitFor() and the rest of the code to process the child process output. Try using a separate thread to handle the child as described, for example, here: http://stackoverflow.com/questions/26319804/adapting-c-fork-code-to-a-java-program

Re: webservice call to get application status for santry enabled DT

2017-02-06 Thread Munagala Ramanath
ail.com> wrote: > hI Ram, > > we running Data Torrent applicatoin using apex 3.2.1 build. > Santry i.e authentication required to access DT console. > > > > On Mon, Feb 6, 2017 at 10:16 PM, Munagala Ramanath <r...@datatorrent.com> > wrote: > >> Could you p

Re: webservice call to get application status for santry enabled DT

2017-02-06 Thread Munagala Ramanath
Could you provide more details on what you're running, i.e. which version of Apex, etc. ? Also, what is santry ? Do any of the other REST API calls work, like "/about" for instance ? Ram On Mon, Feb 6, 2017 at 8:35 AM, chiranjeevi vasupilli wrote: > bringing on top.. > >

Re: Apex Output operator Query

2016-12-23 Thread Munagala Ramanath
There are multiple approaches, each with its own tradeoffs. Here is first step: A1. Create a pair of in-memory non-transient queues to hold the tuples (non-transient because we want them to be checkpointed and restored on recovery from failure). A2. Create a separate thread that waits for

Re: JDBCtoJDBC Example

2016-12-21 Thread Munagala Ramanath
The operator has this comment at the top: ** This implementation is generic so it uses offset/limit mechanism for batching which is not optimal. Batching is* * * most efficient when the tables/views are indexed and the query uses this information to retrieve data.* * * This can be achieved in

Re: Shell script for launching and shutting down

2016-12-16 Thread Munagala Ramanath
In addition to the methods mentioned by others, there is also the REST API documented at: http://docs.datatorrent.com/dtgateway_api/ The calls: *POST /ws/v2/appPackages?merge={replace|fail|ours|theirs}* POST

Re: Application level logs

2016-12-12 Thread Munagala Ramanath
Once log aggregation is enabled as Priyanka described, you can retrieve logs for a specific application with a command like this: *yarn logs -applicationId application_1473359658420_0588* Ram On Mon, Dec 12, 2016 at 2:29 AM, Priyanka Gugale wrote: > You need to set hadoop

Re: Some Questions on consuming/restarting/monitoring Apex Kafka consumer

2016-12-09 Thread Munagala Ramanath
W.r.t. item 3, it may be an issue of whether the group is active or not and whether it has been flushed from the memory cache. Some discussion here: http://grokbase.com/t/kafka/users/15bsxxgp49/new-kafka-consumer-groups-sh-not-showing-inactive-consumer-gorup Ram On Thu, Dec 8, 2016 at 8:36 AM,

Re: Operators in Pending Deploy

2016-12-02 Thread Munagala Ramanath
Just to be sure, did that change resolve the issue ? Ram On Fri, Dec 2, 2016 at 11:54 AM, Max Bridgewater wrote: > OK. Thanks guys. It continued to fail with min 256 and max 512 or 1024. In > the end I switched off the check of ratio virtual/physical memory. My >

Re: Apex Communication Protocols

2016-11-25 Thread Munagala Ramanath
A brief description is here: http://docs.datatorrent.com/beginner/#buffer-server It is indeed part of apex-core; see: https://github.com/apache/apex-core/tree/master/engine/src/main/java/com/datatorrent/stram/ as well as the *stream* subdirectory. Ram On Fri, Nov 25, 2016 at 11:13 AM, Max

Re: monitoring STRAM events for killed containers

2016-11-25 Thread Munagala Ramanath
One way is to use an external process that invokes appropriate REST API calls and checks results. An sample Python script to do this for the application as a whole is at: https://github.com/DataTorrent/examples/blob/master/tools/monitor.py The REST API is documented at:

Re: SortedWordCount example is not running

2016-11-25 Thread Munagala Ramanath
The .apa file content can be listed with:* jar tvf ./target/*.apa* Could you post the output of "*mvn dependency:tree*" ? Ram On Fri, Nov 25, 2016 at 4:15 AM, prakashdutt...@hotmail.com < prakashdutt...@hotmail.com> wrote: > The apa lists message is showing as below > > apex> launch

Re: Apex with SSH Tunel

2016-11-21 Thread Munagala Ramanath
Since the diagnostic is about exceeding virtual memory limits, please see: http://docs.datatorrent.com/configuration/#yarn-vmem-pmem-ratio-tuning for an alternative solution. Ram On Mon, Nov 21, 2016 at 4:20 AM, Max Bridgewater wrote: > The issue turned out to be

Re: how to connect to database having the DB properties file outside the resources folder or applcation

2016-11-17 Thread Munagala Ramanath
For Req1, a better approach might be to have a single "error reporting" operator that connects to the DB; all the other operators can send custom error tuples to it with appropriate details of the type of error. That way, if there are issues that are DB-related, you have 1 place to debug them

Re: SortedWordCount example is not running

2016-11-15 Thread Munagala Ramanath
Couple of questions: Could you outline the steps you followed to run this application ? Do you have any custom code that is creating or modifying a Configuration object ? Can you try building and running some of the example programs at:

Re: Operators stay in PENDING_DEPLOY state

2016-11-05 Thread Munagala Ramanath
>I have set the YARN minimum allocation property *in property.xml of the >project*, which doesn't solve the problem. Setting it in your project will not help; you need to set it in yarn-site.xml and restart YARN. If you're running on the sandbox, the location of that file is in my earlier

Re: Operators stay in PENDING_DEPLOY state

2016-11-04 Thread Munagala Ramanath
To add to Ashwin's diagnosis, assuming you have *properties.xml* setup as described in that tutorial, you need to add to the YARN configuration file *yarn-site.xml* at: */sfw/hadoop/current/etc/hadoop* the following stanza: *yarn.scheduler.minimum-allocation-mb128* and restart the Hadoop

Re: balanced of Stream Codec

2016-10-15 Thread Munagala Ramanath
If you want round-robin distribution which will give you uniform load across all partitions you can use a StreamCodec like this (provided the number of partitions is known and static): *public class CatagoryStreamCodec extends KryoSerializableStreamCodec {* * private int n = 0;* * @Override* *

Re: SortedWordCount application error

2016-09-18 Thread Munagala Ramanath
604acf0e41 > Yes I have used the apex command to lauch the apa file. > > > Thanks > Sanal > > On Tue, Sep 13, 2016 at 1:29 AM, Munagala Ramanath <r...@datatorrent.com> > wrote: > >> Looks like this happens if jersey-server is not present >> (e.g. http://stacko

Re: Optional output ports.

2016-09-13 Thread Munagala Ramanath
Yes, if you have ports at least one must be connected if there are no annotations on them. The code is in LogicalPlan.validate() -- checkout the allPortsOptional variable. Ram On Tue, Sep 13, 2016 at 3:17 AM, Tushar Gosavi wrote: > Hi All, > > I have an input operator

Re: SortedWordCount application error

2016-09-12 Thread Munagala Ramanath
Looks like this happens if jersey-server is not present (e.g. http://stackoverflow.com/questions/8662919/jersey-no-webapplication-provider-is-present-when-jersey-json-dependency-added ) Have you made any changes to the pom.xml ? If so, can you post it here ? Also, can you tell us a bit more

Re: Apex Application failing to run.

2016-09-04 Thread Munagala Ramanath
It looks like there is not enough memory for all the containers. If you're running in the sandbox, how much memory is allocated to the sandbox ? Can you try increasing it ? There is also a detailed discussion of how to manage memory using configuration parameters in the properties files at:

Re: handling French Characters using AbstractFileInputOperator

2016-08-13 Thread Munagala Ramanath
If you are dealing with file data purely as byte arrays and copying them from one place to another, you need not worry about the language or charset since the bytes are preserved. If you are converting them to Strings explicitly or using classes that might do so implicitly, you need to specify an

Re: Round Robin Partitioning with Dynamic Partitioning

2016-08-12 Thread Munagala Ramanath
AM, McCullough, Alex < alex.mccullo...@capitalone.com> wrote: > Thanks Ram. > > > > If I didn’t want dynamic partitioning and just round robin on a fixed # of > partitions, can it just be set through a property? If so, what is the > property? > > > > *From: *Munagal

Re: can operators emit on a different from the operator itself thread?

2016-08-10 Thread Munagala Ramanath
For cases where use of a different thread is needed, it can write tuples to a queue from where the operator thread pulls them -- JdbcPollInputOperator in Malhar has an example. Ram On Wed, Aug 10, 2016 at 1:50 PM, hsy...@gmail.com wrote: > Hey Vlad, > > Thanks for bringing

Re: Input Operator Activate vs Setup

2016-08-05 Thread Munagala Ramanath
"relinquish" => "acquire" :-) Ram On Fri, Aug 5, 2016 at 6:01 AM, Chinmay Kolhatkar wrote: > Hi Alex, > > setup and activate both are guaranteed to be called before starting of > dataflow. Hence in your case either should be fine. > > There might be few hundred ms of

Re: DAG failing with "Invalid AMRMToken from app attempt_*" in log

2016-07-26 Thread Munagala Ramanath
Which version of Hadoop ? If it is 2.6.X, upgrading to 2.7.X might fix the problem, e.g. https://marc.info/?l=hadoop-user=142501302419779=2 Ram On Tue, Jul 26, 2016 at 1:59 PM, Raja.Aravapalli wrote: > > Hi, > > My DAG fails every week atleast once with the below

Re: Sub Partitioning the parallel partitions

2016-07-25 Thread Munagala Ramanath
One way is to have a pass-through operator X that is parallel partitioned like your B currently. Then, connect the output port of X to B and use a suitable partitioner for B to create as many partitions as you want: A -> X -> B -> C. Ram On Mon, Jul 25, 2016 at 9:41 AM, Yogi Devendra

Re: Container & memory resource allocation

2016-07-20 Thread Munagala Ramanath
hooting page sounds the appropriate location. I shall raise PR with > the given suggestions. > > --prad > > On Tue, Jul 19, 2016 at 5:49 AM, Munagala Ramanath <r...@datatorrent.com> > wrote: > > > There is already a link to a troubleshooting page at bottom of > > https:

Re: Container & memory resource allocation

2016-07-19 Thread Munagala Ramanath
There is already a link to a troubleshooting page at bottom of https://apex.apache.org/docs.html That page already has some discussion under the section entitled "Calculating Container Memory" so adding new content there seems like the right thing to do. Ram On Mon, Jul 18, 2016 at 11:27 PM,

Re: DAG is failing due to memory issues

2016-07-12 Thread Munagala Ramanath
Please see: http://docs.datatorrent.com/troubleshooting/#configuring-memory Ram On Tue, Jul 12, 2016 at 6:57 AM, Raja.Aravapalli wrote: > > Hi, > > My DAG is failing with memory issues for container. Seeing below > information in the log. > > > > Diagnostics:

Re: Inputs needed on File Writer

2016-07-12 Thread Munagala Ramanath
Please take a look at the Python script under https://github.com/DataTorrent/examples/tree/master/tools It uses the Gateway REST API to retrieve application info given the name; the id is part of that JSON object. Ram On Tue, Jul 12, 2016 at 6:58 AM, Mukkamula, Suryavamshivardhan (CWM-NR) <

Re: Issue with Inputting large 5GB file through input operator

2016-06-30 Thread Munagala Ramanath
To add some additional explanation to what Sandesh said, it looks like your operator is dying or getting killed after some time, so you should look at the Application Master logs to find out why this is happening. When it goes down, a new operator is created and state from an earlier checkpoint

Re: Reading gZip File from Http

2016-06-29 Thread Munagala Ramanath
Nothing available out-of-the-box but there are some pieces that may be useful: https://github.com/apache/apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/io/AbstractHttpInputOperator.java For the Zip part, there is an example using GZip for output here:

Re: Multiple directories

2016-06-16 Thread Munagala Ramanath
193c1c by jenkins source checksum > 48db4b572827c2e9c2da66982d14 > > 7626", > > "resourceManagerVersion": "2.7.1.2.3.2.0-2950", > >"resourceManagerVersionBuiltOn": "2015-09-30T18:20Z", > > "rmStateStoreName&quo

Re: Dynamic Application Modification

2016-06-10 Thread Munagala Ramanath
You don't need dynamic partitioning to achieve that topology. You can simply create your DAG as: A --> X --> Y and then set the *PARTITIONER* attribute on X as discussed in the "Advanced Features" section of the TopN words tutorial at: http://docs.datatorrent.com/tutorials/topnwords-c7/ The

Re: Capabilities for Dynamic Application Modifications

2016-06-07 Thread Munagala Ramanath
e > Create a stream > > > Thanks, > Junguk > > 2016-06-07 12:37 GMT-04:00 Munagala Ramanath <r...@datatorrent.com>: > >> Hi, >> >> For Dynamic Partitioning, please take a look at the example at: >> >> https://github.com/DataTorrent/examples/tree/

Re: Capabilities for Dynamic Application Modifications

2016-06-07 Thread Munagala Ramanath
Hi, For Dynamic Partitioning, please take a look at the example at: https://github.com/DataTorrent/examples/tree/master/tutorials/dynamic-partition as well as the Malhar class:

Re: Information Needed

2016-05-24 Thread Munagala Ramanath
ders ? > > > > Regards, > > Surya Vamshi > > > > *From:* Munagala Ramanath [mailto:r...@datatorrent.com] > *Sent:* 2016, May, 20 2:35 PM > *To:* us...@apex.incubator.apache.org > *Subject:* Re: Information Needed > > > > It appears that